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Chapter 1 
Introduction 



The NVAX PLUS CPU is a high-performance, single-chip implementation of the VAX architecture. 
It is partitioned into multiple sections which cooperate to execute the VAX base instruction group. 
The CPU chip includes the first levels of the memory subsystem hierarchy in an on-chip virtual 
instruction cache and an on-chip physical instruction and data cache, as well as the controller 
for a large second-level cache implemented in static RAMs on the CPU module. 

The NVAX Plus chip is an NVAX core with an EVAX external interface. Microcode changes are 
also required to support the EVAX interlocks and to input from serial ROM at startup. Most of 
the CBOX-MBOX interface section is reused. The CBOX arbitration logic is redesigned to control 
the EDAL interface. Cache fills and coherency transactions are controlled by EDAL system logic 
with only a single CPU request active at a time. 

1.1 Scope and Organization of this Specification 

This specification describes the operation of the NVAX PLUS chip. It contains an Architecturial 
Summary, a description of the interface to the chip, an overview of the operation of the instruction 
pipeline, and extensive detail about the functional operation of the CBOX section of the chip. 

The IB OX, EBOX, MB OX, FBOX, and Interrupt sections are taken from the NVAX CPU 
Functional Specification. These sections retain the high level description of the section, the 
description of the software visible IPRs, and specify the changes required by NVAX Plus to accom- 
modate the EVAX interface and Vector option. Sections which aid in understanding the interface 
between the NVAX Plus CBOX and NVAX Core are also retained. For a detailed desription of 
the IBOX, EBOX, MBOX, FBOX, and Interrupt sections refer to the NVAX CPU Chip Functional 
Specification. 

In addition, the specification contains discussions of error handling, chip initialization, and testa- 
bility features. 

1.2 Related Documents 

The following documents are related to or were used in the preparation of this document: 

• NVAX CPU Chip Functional Specification 

• EV3 and EV4 Specification 

• DEC Standard 032 VAX Architecture Standard. 
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• NVAX CPU Chip Design Methodology. 

1.3 Terminology and Conventions 

1 .3.1 Numbering 

All numbers are decimal unless otherwise indicated. Where there is ambiguity, numbers other 
than decimal are indicated with the name of the base following the number in parentheses, e.g., 
FF (hex). 

1.3.2 UNPREDICTABLE and UNDEFINED 

RESULTS specified as UNPREDICTABLE may vary from moment to moment, implementation 
to implementation, and instruction to instruction within implementations. Software can never 
depend on results specified as UNPREDICTABLE. 

OPERATIONS specified as UNDEFINED may vary from moment to moment, implementation to 
implementation, and instruction to instruction within implementations. The operation may vary 
in effect from nothing, to stopping system operation. UNDEFINED operations must not cause 
the processor to hang., i.e., reach a state from which there is no transition to a normal state in 
which the machine executes instructions. 

Note the distinction between result and operation. Non-privileged software can not invoke 
UNDEFINED operations. 

1 .3.3 Ranges and Extents 

Ranges are specified by a pair of numbers separated by a and are inclusive, e.g., a range of 
integers 0..4 includes the integers 0, 1, 2, 3, and 4. 

Extents are specified by a pair of numbers in angle brackets separated by a colon and are inclusive, 
e.g., bits <7:3> specify an extent of bits including bits 7, 6, 5, 4, and 3. 

1.3.4 Must be Zero (MBZ) 

Fields specified as Must Be Zero (MBZ) must never be filled by software with a non-zero value. 
If the processor encounters a non-zero value in a field specified as MBZ, a Reserved Operand 
exception occurs. 

1 .3.5 Should be Zero (SBZ) 

Fields specified as Should Be Zero (SBZ) should be filled by software with a zero value. These 
fields may be used at some future time. Non-zero values in SBZ fields produce UNPREDICTABLE 
results. 
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1.3.6 Register Format Notation 

This specification contains a number of figures that show the format of various registers, followed 
by a description of each field. In general, the fields on the register are labeled with either a name 
or a mnemonic. The description of each field includes the name or mnemonic, the bit extent, 
and the type. An example of a register is shown in Figure 1—1. Table 1—1 is an example of the 
description of the fields in this register. 



Figure 1-1 : Register Format Example 



31 30 2& 28|2 


7 26 25 24 


123 22 21 


20I1S 16 17 16|15 14 13 12 111 10 09 08|07 06 05 04 103 02 01 00 


110 0 0 


0 0 0 0 


1 


FAU1T_CMD |xxxx|IE|0 0 0 0 0 0 0 0| | | | 


1 1 i 

TRAP 4 | | 

INTERRUPT -+ | 
BUS ERROR + 


Table 1-1: 


Register Field Description Example 


Name 


Bit(s) 


Type 


Description 


BUSJERROR 


0 


WC,0 


The BUS_ERROR bit is set when a bus error is detected. 


INTERRUPT 


1 


WC,0 


The INTERRUPT bit is set when an error that is reported as an inter- 
rupt is detected. 


TRAP 


2 


WC,0 


The TRAP bit is set when an error that is reported as a trap is detected. 


EE 


11 


RW,0 


The IE bit enables error reporting interrupts. When IE is 0, interrupts 
are disabled. When IE is a 1, interrupts are enabled. 


FAULT.CMD 


23:16 


RO 


The FAULT_CMD field latches the command that was in progress when 
an error is detected. 



The "Type" column in the field description includes both the actual type of the field, and an 
optional initialized value, separated from the type by a comma. The type denotes the functional 
operation of the field, and may be one of the values shown in Table 1-2. If present, the initialized 
value indicates that the field is initialized by hardware or microcode to the specified value at 
powerup. If the initialized value is not present, the field is not initialized at powerup. 



Table 1-2: Register Field Type Notation 

Notation Description 

RW A read-write bit or field. The value may be read and written by software, microcode, 

or hardware. 

RO A read-only bit or field. The value may be read by software, microcode, or hardware. 

It is written by hardware; software or microcode writes are ignored. 

WO A write-only bit or field. The value may be written by software or microcode. It is read 

by hardware and reads by software or microcode return an UNPREDICTABLE result. 
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Table 1-2 (Cont.): Register Field Type Notation 

Notation Description 

WZ A write-only bit or field. The value may be written by software or microcode. It is read 

by hardware and reads by software or microcode return a 0. 

WC A write-one- to-clear bit. The value may be read by software or microcode. Software or 

microcode writes of a 1 cause the bit to be cleared by hardware. Software or microcode 
writes of a 0 do not modify the state of the bit. 

RC A read-to-clear field. The value is written by hardware and remains unchanged until 

read. The value may be read by software or microcode, at which point, hardware may 
write a new value into the field. 



In addition to named fields in registers, other bits of the register may be labeled with one of the 
three symbols listed in Table 1—3. These symbols denote the type of the unnamed fields in the 
register. 

Table 1-3: Register Field Notation 



Notation Description 

0 A "0" in a bit position denotes a register bit that is read as a 0 and ignored on write. 

1 A "1" in a bit position denotes a register bit that is read as a 1 and ignored on write. 

x Ad V in a bit position denotes a register bit that does not exist in hardware. The 

value is UNPREDICTABLE when read, and ignored on write. 
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1.3.7 Timing Diagram Notation 

This specification contains a number of timing diagrams that show the timing of various signals, 
including NDAL signals. The notation used in these timing diagrams is shown in Figure 1—2. 



Figure 1-2: Timing Diagram Notation 

HIGH 

LOW 

INTERMEDIATE 

VALID_HI GH_OR_l,OW 

CHANGING X>0OO0OOCX 

I NVAL ID_BUT_NOT_ CHANG IN G XXXXXXX55? 

KIGH_TO_LOW \\\\ 

HIGH_TO_VALID WW 

HIGH_TO_INVALID WWXX 

INTERMED I AT E_TO__LOW i ^ 

HI GH_TC_ INTERMEDIATE x ^ ^ ^ 

LOW_TO_HIGH //// 

LOW_TO_VALID //// 

LOW_TO_INVALID ////XX 

INTERMED I ATE_TO_HI GH f < ( ' 

LOW_TO__INTERMED I ATE rrri 

VAL I D_TO_ INTERMED I ATE > > ) 

INVAL ID_TO_INTERMED I ATE XXX) > ) 

INTERMED I ATE__T 0__VAL ID ( ( <( 

INTERMED I ATE__T0_I NVAL I D ( < ( 
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1.4 Revision History 



Table 1-4: 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


06-Mar-1989 


Release for external review. 


Mike Uhler 


15-Dec-1989 


Update for second-pass release. 


Oil Wolrich 


15-Nov-1990 


NVAX PLUS release for external review. 
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Chapter 2 

Architectural Summary 



2.1 Overview 

This chapter provides a summary of the VAX architectural features of the NVAX Plus CPU Chip. 
It is not intended as a complete reference but rather to give an overview of the user-visible 
features. For a complete description of the architecture, consult the VAX Architecture Standard 
(DEC Standard 032). 

2.2 Visible State 

The visible state of the processor consists of memory, both virtual and physical, the general 
registers, the processor status longword (PSL), and the privileged internal processor registers 
(IPRs). 

2.2.1 Virtual Address Space 

The virtual address space is four gigabytes (2**32), separated into three accessable regions (P0, 
PI, and SO) and one reserved region, as shown in Figure 2-1. I 
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3FFFFFFF 

40000000 



7FFFFFFF 

80000000 



FFFFFDFF 



FFFFFE00 
FFFFFFFF 





length of PC Region in 
pages (P0LR) 



P0 Region growth direction 
PI Region growth direction 



length of PI Region in 
pages (2**21-P1LR) 

length of System Region 
in pages (SLR; 



System Region growth 
direction 



2.2.2 Physical Address Space 

The NVAX Plus CPU naturally generates 32-bit physical addresses. This corresponds to a four 
gigabyte physical address space as shown in Figure 2—2. Memory space occupies the first seven- 
eighths (3.5GB) of the physical address space. I/O space occupies the last one-eighth (512MB) 
of the physical address space .and can be distinguished from memory space by the fact that bits 
<31:29> of the physical address are all ones. 
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Figure 2-2: 32-bit Physical Address Space Layout 



00000000 



DFFFFFFF 



E0000000 
FFFFFFFF 




3.5 GE 



512 MB 



In addition to the natural 32-bit physical address, the CPU may be configured to generate 30-bit 
physical addresses. In this mode, only 512MB of memory space can be referenced, as shown in 
Figure 2-3. 



Figure 2-3: 30-bit Physical Address Space Layout 



00000000 
1FFFFFFF 



20000000 



Memory 
Space 



512 MB 



Inaccessable 
Region 



-+ 3.0 GB 



DFFFFFFF 



E0000000 
FFFFFFFF 



I/O I 512 MB 

Space 



The translation from 30-bit addresses to 32-bit addresses is accomplished by sign-extending 
PA<29> to PA<31:30>. In this mode, the programmer sees a 1GB address space, split evenly 
between memory and I/O space, which is mapped to the actual 32-bit physical address space as 
shown in Table 2-1. Unless explicitly stated otherwise, addresses that are given in the remainder 
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of this specification are the full 32-bit addresses (which, of course, may have been generated from 
a 30-bit program address via the mapping shown). 



Table 2-1 : 30-btt Mapping of Program Addresses to 32-bit Hardware Addresses 



Program Address 


Hardware Address 


00000000..1FFFFFFF 


00000000..1FFFFFFF 


20000000..3FFFFFFF 


EOOOO000..FFFFFFFF 



2.2.2.1 Physical Address Control Registers 

During powerup, microcode configures the CPU to generate 30-bit physical addresses. Console 
firmware may then reconfigure the CPU to generate either 30-bit or 32-bit physical addresses by 
writing to the MODE bit in the PAMODE and VPAMODE registers, respectively. The PAMODE 
register is shown in Figure 2—4. 

Figure 2-4: PAMODE Register 



31 30 29 2612" 26 25 24123 22 21 20119 18 17 16115 14 13 12 111 10 09 08 1 07 06 05 04 103 02 01 00 

i C 0 0 0 0 C 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0j I ; PAMODE 

I 

MODE — ' 



The VPAMODE register is identical in format to the PAMODE register. 

The PAMODE register also determines how PTEs are to be interpreted. In 30-bit mode, PTEs 
are interpreted in 21-bit PFN format. In 32-bit mode, PTEs are interpreted in 25-bit PFN for- 
mat (although the two upper bits of the PFN field are ignored). The different PTE formats are 
described in Section 2.6.4. 

2.2.3 Registers 

There are 16 32-bit General Purpose Registers (GPRs). The format is shown in Figure 2-5, and 
the use of each GPR is shown in Table 2—2. 

Figure 2-5: General Purpose Registers 



31 30 29 28127 26 25 24123 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08107 06 05 04 103 02 01 00 
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Table 2-2: 


General Purpose Register Usage 


GPR 


Synonym Use 


RO-Rll 




Greneral Purpose 


R12 


AP 


Argument Pointer 


RlS 


FP 


Frame Pointer 


R14 


SP 


Stack Pointer 


R15 


PC 


Program Counter 


The Processor Status Longword CPSL) is a 32-bit register which contains processor state. The 
PSL format is shown in Figure 2-6, and the fields of the PSL are shown in Table 2-3. 


Figure 2-6: 


Processor Status Longword Fields 


31 3C 2 f 28 1 


2*7 26 25 24 


123 22 21 20116 18 17 16115 14 13 12 111 10 OS 0810" 06 05 04 103 02 01 00 


1 i ! IMEIFPI I CUR 
ICMITPIVMIZ ID IISI MOD 


! PRV 1MB 1 I I 11 1 1 i 1 I 1 

I MOD \Z 1 IPL ! MB 2 |DV|FU|rV| T| N 1 21 V| C| :PSi 




Table 2-3: 


Processor Status Longword 


Name 


Bit(s) 


Description 


CM 


31 


Compatability Mode 


TP 


30 


Trace Pending 


VM 


29 


Virtual Machine Mode 1 


FPD 


27 


First Part Done 


IS 


26 


Interrupt Stack 


CUR.MOD 


25:24 


Current Mode 


PRV.MOD 


23:22 


Previous Mode 


IPL 


20:16 


Interrupt Priority Level 


DV 


7 


Decimal Overflow Trap Enable 


FU 


6 


Floating Underflow Fault Enable 


TV 


5 


Integer Overflow Trap Enable 


T 


4 


Trace Trap Enable 


N 


3 


Negative Condition Code 


Z 


2 


Zero Condition Code 


V 


1 


Overflow Condition Code 


c 


0 


Carry Condition Code 



X MBZ unless virtual machine option :1b implemented 
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2.3 Data Types 

The NVAX Plus CPU supports nine data types: byte, word, longword, quadword, character 
string, variable length bit field, F_floating, D.fioating, and G_floating. These are summarized in 
Figure 2-7. 

Figure 2-7: Data Types 



07 06 05 04 1 03 02 01 00 

Data Type: Byte 
Length: 6 bits 

Use: Signed or unsigned integer 

15 14 13 12111 1C OS 0810"' 06 05 04 103 02 01 00 



Date Type: Wore 
Length: 16 bits 

Use: Signed or unsigned integer 



31 30 29 2612" 26 25 24 123 22 21 20119 16 17 16115 14 13 12 111 10 09 08 1 07 06 05 04 103 02 01 00 



:A 



Data Type: Longword 
Length: 32 bits 

Use: Signed or unsigned integer 



31 30 29 28I2-' 26 25 24123 22 21 20119 18 17 16115 14 13 12 111 10 09 08|07 06 05 04 103 02 01 00 

i :A 

I I :A+4 

+— . k + __^__^__^__ + _._*-_ + — + __ + __ + __ + __^_- + _ — k-_+_« + «._4«._ + __4.__ + __ + -_4._ — k-- + 

Data Type: Quadword 

Length: 64 bits 

Use: Signed integer 



Figure 2-7 Cont'd on next page 
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Figure 2-7 (Cont.): Data Types 



07 06 05 04 1 03 02 01 00 
4.-.-.4.— 4—-4—- 4.--+- -4—- 4— — 4 

I I :A 

4—-4— 4— »+~4— 4— 4— — '4—- 4 

I I -:A+1 



I | :A4 length- 1 

*__^__ + __»__ + __ + _- + __^__ + 

Date. Type: Character String 

•Length: 0-64K bytes 

Use: Byte string 

31 P+E P+S-l P P-l 00 

j.- -H 4.— 4— -4— — _4 (.__4.__ + _«4.__ + __4.-_+__ + __+__+__4.__ + __h._ — |— — (_ — h—J -4— 4~-4 

: \ 1 1 1 / 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 \ - ! :A 



Data Type: Variable length bit field 
Length: 0-32 bits 
Use: Bit string 



15 14 13 12 111 10 05 08 107 06 05 04 103 02 01" 00 

4— i— + 

! s ! exponent I fraction I :A 

4-~ 4.-4.—+— 4....4— -4— 4— 4. —.4... ..4. —.4—- .4.--. 4.-- 4.-.- 4— 4 

I fraction I :A+2 

4—.„4. 4.— 4— -4— 4—-+— 4.— 4— 4— 4—-+-- 4— -4—- +-« +— 4 

31 30 2S 28127 26 25 24123 22 21 20116 18 17 16 

Data Type: r_floating 

Length: 32 bits 

Use: Floating point 

15 .14 13 12 111 10 OS 08 107 06 05 04 103 02 01 00 













i s i 


exponent 1 


fraction 


1 


:A 


1 


fraction 




1 


:A+2 




fraction 




1 


:A+4 


1 


fraction 






:A4-6 


63 62 61 


60 159 58 57 56(55 54 


53 52 151 50 4S 


48 




Data Type: 


D_f loating 








Length : 


67 bits 









Floating point 



Figure 2-7 Cont'd on next page 
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Figure 2-7 (Cont.): Data Types 



15 14 13 12 111 10 09 08 107 06 05 04 | 03 02 01 0C 









s 1 


exponent 


1 fraction I :A 




fraction 


1 :A+2 




fraction 


1 :A+4 




fraction 


1 :A+6 



63 62 61 60 1 5S- 56 57 56155 54 53 52 1 51 50 49 46 

Data Type: G_floating 

Length: 64 bits 

Use: Floating point 



2.4 Instruction Formats and Addressing Modes 

"VAX instructions consist of a one- or two-byte opcode, followed by zero to six. operand specifiers. 

2.4.1 Opcode Formats 

An opcode may be either one or two contiguous bytes. The two-byte format begins with an FD 
(hex) byte and is followed by a second opcode byte. The one-byte format is indicated by an opcode 
byte whose value is anything other than FD (hex). The one- or two-byte opcode format is shown 
in Figure 2-8. 

Figure 2-8: Opcode Formats 



07 06 05 04 1 03 02 01 00 

4— 4—-*— 4—+—+ 

One-byte opcode: I opcode I s A 



15 14 13 12111 10 OS 08107 06 05 04103 02 01 00 
Two- byte opcode: ' I opcode | FD I :A 



2.4.2 Addressing Modes 

An operand specifier starts with a specifier byte and may be followed by a specifier extension. 
Bits <3:0> of the specifier byte contain a GPR number and bits <7:4> of the specifier byte indi- 
cate the addressing mode of the specifier. If the register number in the specifier byte does not 
contain 15, the addressing mode is a general register addressing mode. If the register number 
in the specifier byte does contain 15, the addressing mode is a PC-relative addressing mode. The 
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different addressing modes are shown graphically in Figure 2-9. General register addressing 
modes are listed in Table 2-4 and PC-relative addressing modes are listed in Table 2-5. 

Figure 2-9: Addressing Modes 

07 06 05 04 103 02 01 00 
General register ■*—+———+—>——«— f—— — 
adores sine mode: I mode I register I 

H h 

07 06 05 04 1 03 02 01 00 
PC-relative + — 4.—+— +- — f 

addressinc mode: I mode i 1 1 1 11 
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Table 2-4: General Register Addressing Modes 

Access 



Mode 


Name 


Assembler 


r m w a v 


PC 


SP 


Indexable? 


0-3 


literal 


S A #literal 


y f f f f 


x 


x. 


f 


4 


index 




V V V V V 

J J J J J 


u 


v 


f 


c 


warn O+^l* 


XVll 


y y y i y 


u 


uq 


f 

i 


6 


register deferred 


(Rn) 


yyyyy 


u 


y 


y 


7 


auto decrement 


-(Rn) 


y y y y y 


u 


y 


ux 


8 


autoincrement 


(Rn)+ 


yyyyy 


p 


y 


UX 


9 


autoincrement deferred 


@(RnH 


yyyyy 


p 


y 


ux 


A 


byte displacement 


B A d(Rn) 


yyyyy 


p 


y 


y 


B 


byte displacement deferred 


@B A d(Rn) 


yyyyy 


p 


y 


y 


C 


word displacement 


W A d(Rn) 


yyyyy 


p 


y 


y 


D 


word displacement deferred 


@W A d(Rn) 


yyyyy 


p 


y 


y 


E 


longword displacement 


L A d(Rn) 


yyyyy 


p 


y 


y 


F 


longword displacement de- 


@L A d(Rn) 


yyyyy 


p 


y 


y 



ferred 



Access Types 

r = read 

m = modify 

w = write 

a = address 

v = variable bit field 

Syntax 

i = any indexable address mode 
d = displacement 
Rn = general register, n = 0 to 15 
Rx s general register, n s 0 to 14 

Results 

y s yes, always valid address mode 
f = reserved addressing mode fault 
x = logically impossible 
p = program counter addressing 
u = unpredictable 

ud = unpredictable for destination of CALLG, CALLS, JMP and JSB 
uq = unpredictable for quad, D/G_fioating and field if pos+size > 32 
ux = unpredictable if index register ■ base register 
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Table 2-5: PC-Relative Addressing Modes 



Access 

Mode Name Assembler r m w a v PC SP Indexable? 



8 


immediate 


I A #constant 


y u u y ud 




9 


absolute 


@#address 


yyyyy 


y 


A 


byte relative 


B A addresB 


yyyyy 


y 


B 


byte relative deferred 


@B A address 


yyyyy 


y 


C 


word relative 


W A address 


yyyyy 


y 


D 


word relative deferred 


@W A address 


yyyyy 


y 


E 


longword relative 


L A addresB 


yyyyy 


y 


F 


longrword relative deferred 


@L A addresB 


yyyyy 


y 



For notation, refer to the key in Table 2—4 



2.4.3 Branch Displacements 

Branch instructions contain a one- or two-byte signed branch displacement after the final specifier 
(if any). The branch displacement is shown in Figure 2-10. 

Figure 2-10: Branch Displacements 



07 06 05 04 1 03 02 01 00 
Signed byte - •+--+—■+—- +~ h 
displacement: I displacement I 



15 li 13 12111 10 09 08107 06 05 04|03 02 01 00 
Signed word +—+—4.-^4—-*--+ — h— ■+—+--+--•(-— +--+--+—+— +--■ 

displacement: | displacement I 



2.5 instruction Set 

The NVAX Plus CPU supports the VAX Base Instruction Group as defined in DEC Standard 032 
plus the optional VAX vector instructions and the virtual machine instructions. These instructions 
are listed in Table 2-6. 
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Table 2-6: NVAX Instruction Set 



Opcode 



Instruction 



Exceptions 



Integer, Arithmetic and Logical Instructions 



58 



ADAWI add.rw, sum.mw 



80 
CO 
AO 



ADDB2 add.rb, sum.mb 
ADDL2 add.rl, sum.ml 
ADDW2 add.rw, sum.mw 



10V 

iov 
iov 



81 
CI 
Al 



ADDB3 addl.rb, add2.rb, sum.wb 
ADDL3 addl.rl, add2.rl, sum.wl 
ADDW3 addl.rw, add2.rw, Bum.ww 



iov 
iov 
iov 



D8 



ADWC add.rl, sum.ml 



iov 



78 
79 



ASHL cnt.rb, srcrl, dstwl 
ASHQ cnt.rb, srcrq, dst.wq 



0 iov 
0 iov 



8A 
CA 
AA 



BICB2 mask.rb, dst.mb 
BICL2 mask.rl, dst.ml 
BICW2 mask.rw, dst.mw 



8B 
CB 
AB 



BICB3 mask.rb, src.rb, dstwb 
BICL3 mask.rl, src.rl, dst.wl 
BICW3 mask.rw, src.rw, dst.ww 



88 
C8 
A8 



BISB2 mask-rb, dst.mb 
BISL2 mask.rl, dst.ml 
BISW2 mask.rw, dst.mw 



89 
C9 
A9 



BISB3 mask.rb, src.rb, dst.wb 
BISL3 mask-rl, srcrl, dst.wl 
BISW3 mask.rw, src.rw, dst.ww 



93 
D3 
B3 



BITB mask.rb, src.rb 
BITL mask.rl, src.rl 
BlTW mask.rw, src.rw 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


z 


V 


c 


Exceptions 


Integer, Arithmetic and Logical Instructions 


94 


CLRB dst wb 


o 


1 


o 






D4 


CLRL1=F1 dst wl 


o 


1 


o 






7C 


CLR01=D=G) dst wc 


o 


1 


o 






B4 


CLRW dst. ww 


o 


x 


o 






91 


CMFB srcl.rb, src2.rb 


* 


* 


0 


* 




Dl 


CMPL srcl rl src2 rl 


* 


* 


o 


* 




Bl 


CMPW srcl.rw src2.Tw 


* 


* 


o 






98 


CVTBL srcrb, dst.wl 


* 


* 


0 


0 




99 


CVTBW src.rb, dst.ww 


* 


* 


o 


o 




F6 


CVTLB srcrl, dst.wb 


* 


* 


* 


o 


iov 


F7 


CVTLW srcrl, dst.ww 




* 


* 


0 


iov 


33 


CVTWB src.rw, dstwb 


* 


* 


* 


0 


iov 


32 


CVTWL src.rw, dstwl 


* 


* 


o 


o 




97 


DECB dif.mb 


* 


* 


* 


* 


iov 


D7 


DECL dif.ml 


* 


* 


* 


* 


iov 


B7 


DECW dif.mw 


* 


* 


* 


* 


iov 


86 


DIVB2 divr.rb, quo.mb 




* 


* 


0 


iov, idvz 


C6 


DIVL2 divr.rl, quo.ml 


* 


* 


* 


0 


iov, idvz 


A6 


T1TVW2 cH vr rw min mw 


* 






o 


■iov "idv7 


87 


DIVB3 divr.rb, divd.rb, quo.wb 


* 


* 




0 


iov, idvz 


C7 


DIVL3 divr.rl, diveLrl, quo.wl 


* 


* 


* 


0 


iov, idvz 


A7 


DIVW3 divr.rw, divd.rw, quo.ww 


* 




* 


0 


iov, idvz 


7B 


EDIV divr.rl, divdrq, quo.wl, rem.wl 


* 


* 




0 


iov, idvz 


7A 


EMUL mulr.rl, muld.rl, add.rl, prod.wq 


* 


* 


0 


0 




96 


INCB sum.mb 


* 


* 


* 


* 


iov 


D6 


LNCL sum.m] 


* 


* 


* 


* 


iov 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 



Instruction 



N 



Exceptions 



Integer, Arithmetic and Logical Instructions 



B6 



INCW Bum.mw 



10V 



92 
D2 
B2 



MCOMB srcrb, dst.wb 
MCOML srcrl, dst.wl 
MCOMW src.rw, dstww 



8E 
CE 
AE 



MNEGB srcrb, dst.wb 
MNEGL srcrl, dst.wl 
MNEGW src.rw, dst.ww 



10V 

iov 
iov 



90 
DO 
7D 
BO 



MOVB Brcrb, dst.wb 
MOVL srcrl, dst.wl 
MOVQ src.rq, dstwq 
MOVW src.rw, dst.ww 



9A 
9B 
3C 



MOVZBW src.rb, dst.wb 
MOVZBL src.rb, dst.wl 
MOVZWL src.rw, dst.wl 



84 
C4 
A4 



MULB2 mulr.rb, prod.mb 
MULL2 mulr.rl, prod.ml 
MULW2 mulr.rw, procLmw 



iov 
iov 
iov 



85 
C5 
A5 



MULB3 mulr.rb, muld.rb, prod.wb 
MULL3 mulr.rl, muld.rl, prod.wl 
MULW3 mulr.rw, muld.rw, prod.ww 



iov 
iov 
iov 



DD 
9C 
D9 



PUSHL srcrl, {-(SP).wl) 



ROTL cnt.rb, srcrl, dst.wl 



SBWC sub.rl, dif.ml 



iov 



82 



SUBB2 sub.rb, dif.mb 



iov 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


Z 


V 


c 


Exceptions 


Integer, Arithmetic and Logical Lastructions 


C2 


SUBL2 sub.rl, dif.ml 


* 




* 


* 


iov 


A2 


SUBW2 sub.rw, dif.mw 




* 


* 


* 


iov 


83 


SUBB3 sub.rb, min.rb, dif.wb 


* 


* 


* 


* 


iov 


C3 


SUBL3 sub.rl, min.rl, dif.wl 


* 


* 


* 




iov 


A3 


SUBW3 sub.rw, min.7"w, dif.ww 


* 


* 


* 


* 


iov 


95 


TSTB srcrb 


* 


* 


0 


0 






'POTT sw> vl 

loxju srcn 


* 


* 


A 

u 


u 




B5 


TSTW srcrw 


* 


* 


0 


0 




8C 


XORB2 mask.rb, dst.mb 


* 


* 


0 






CC 


XORL2 mask.rl, dst.ml 


* 


■ * 


0 






AC 


XORW2 mask.rw, dstmw 


* 


* 


0 






8D 


XORB3 mask.rb, srcrb, dst.wb 


* 


* 


0 






CD 


XORL3 mask.rl, srcrl, dst.wl 




* 


0 






AD 


XORW3 mask.rw, srcrw, dst.ww 


* 


* 


0 






Address Instructions 


9E 


MOVAB srcab, dstwl 


* 


* 


0 






DE 


MOVAL(=F} srcal, dst.wl 




* 


0 






7E 


MOVAQ{=D=G} srcaq, dstwl 


* 


* 


0 






3E 


MOVAW srcaw, dstwl 


* 


* 


0 






9F 


PUSHAB srcab, {-(SP).wl) 


* 


* 


0 






DF 


PUSHAL{s=F} srcal, {-(SP).wl} 


* 




0 






7F 


PUSHAQ{=D=G} srcaq, {-CSP).wl} 


* 


* 


0 






3F 


PUSHAW srcaw, {-(SP).wl} 


* 


* 


0 






Variable-Length Bit Field Instructions 


EC 


CMPV pos.rl, size.rb, base.vb, {field.rv}, srcrl 


* 


* 


0 


* 


rsv 


ED 


CMPZV pos.rl, size.rb, base.vb, {field.rv}, srcrl 


* 


* 


0 


* 


rsv 
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Table 2-6 (Cont.): NVAX Instruction Set 

Opcode Instruction N Z V C Exceptions 

Variable- Length Bit Field Instructions 



EE EXTV pos.rl, size.rb, baise.vb, {field.rv}, dst.wl * * 0 - rsv 

EF EXTZV pos.rl, size.rb, base.vb, {field.rv}, dst.wl * * 0 - rsv 

P0 INSV src.rl, pos.rl, size.rb, base.vb, {field. wv) - - - rsv 

EB FFC startpos.rl, size.rb, baBe.vb, {fieldrv! , find- 0 * 0 0 rsv 
pos.wl 

EA FFS startpos.rl, size.rb, base.vb, {fieldrv |, find* 0 * 0 0 rsv 
pos.wl 

Control Instructions 

9D ACBE limit.rb, add.rb, index.mb, displ.bw * * * - iov 

Fl ACBL limit-rl, add.rl, index.ml, displ.bw * * * - iov 

3D ACBW limit.rw, add.rw, index.mw, displ.bw * * * - iov 

F3 AOBLEQ limit.rl, indejcml, displ.bb * * * - iov 

F2 AOBLSS lrmit.rl, index.ml, displ.bb * * * - iov 

IE BCC(=BGEQU) displ.bb - 

IF BCS<=BLSSU} dispLbb - - - - 

13 BEQLIsBEQLU} dispLbb - 

18 BGEQ displ.bb - 

14 BGTR displ.bb . - 
1A BGTRU di B pl.bb - 

15 BLEQ displ.bb - - - - 
IB BLEQU displ.bb - 

19 BLSS displ.bb - 
12 BNEQ{=BNEQU) displ.bb - 
1C BVC displ.bb - 
ID BVS dispLbb - 

El BBC pos.rl, base.vb, displ.bb, {field.rv} - - - - rsv 

E0 BBS pos.rl, base.vb, displ.bb, {fieldrv} - - - - rsv 
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Table 2-6 (Cont.): NVAX Instruction Set 

Opcode Instruction N Z V C Exceptions 

Control Instructions 

E5 BBCC pos.rl, baBe.vb, displ.bb, {field.mv} - - - - rsv 

E3 BBCS pos.rl, base.vb, dispLbb, {field.mv} - - - - rsv 

E4 BBSC pos.rl, base.vb, displ.bb, {fielcLmv} - - - - rsv 

E2 BBSS pos.rl, base.vb, dispLbb, {field.mv} - - rsv 

E7 BBCCI pos.rl, base.vb, displ.bb, {field.mv} - - - - rsv 

E6 BBSSI pos.rl, base.vb, displ.bb, {field.mv) - - - - rsv 

E9 BLBC srcrl, displ.bb - 

E8 " BLBS srcrl, displ.bb - 

11 BRB displ.bb - 

31 BRW displ.bw - 

10 BSBB dispLbb, {-(SP).wl} - 

30 BSBW displ.bw, <-(SP).wl} - 

8F CASEB selector.rb, base.rb, limit.rb, di.spl.bw- * * 0 * 

list 

CF CASEL selector.rl, ba8e.rl, lim.it.rl, dispLbw- * * 0 * 

list 

AF CASEW selector.rw, base.rw, limit.rw, dispLbw- * * 0 * 

list 

17 JMP dst.ab - - - - 

16 JSB dst.ab, {-(SP).wl} - 

05 RSB KSPH.rl} - 

F4 SOBGEQ index.ml, dispLbb * * * - iov 

F5 SOBGTR index.m.1, displ.bb * * * - iov 
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Table 2-6 (Corn.): NVAX Instruction Set 



Opcode 


Instruction 


N 


z 


V 


C Exceptions 


Procedure Call Instructions 


FA 


CALLG arglist.ab, dst.ab, {-(SP).w*) 


0 


0 


0 


0 rsv 


FB 


CALLS numarg.rl, dst-ab, {-(SP).w*} 


0 


0 


0 


0 rsv 


04 


RET i(SP)+.r*} 


* 




* 


* rsv 


Miscellaneous Instructions 


B9 


BICPSW mask.rw 


* 


* 


* 


* rsv 


B8 


BISPSW maskrw c 


* 


* 


* 


* rsv 


03 


BPT (-(KSP).w*} 


0 


0 


0 


0 


00 


HALT {-(KSP).w*} 








- prv 


OA 


INDEX subscript.rl, low.rl, bigb.rl, size.rl, in- 


* 


* 


0 


0 sub 




dexin.rl, indexoutwl 










DC 


MOVPSL dstwl 










01 


NOP 










BA 


POPR mask.rw, KSP)+.r*} 










BB 


PUSHR mask-rw, {-(SP).w*} 










FC 


XFC {unspecified operands} 


0 


0 


0 


0 


Queue Instructions 


5C 


INSQHI entry.ab, header.aq 


0 


* 


0 


* rsv 


5D 


INSQTI entry.ab, header.aq 


0 


* 


0 


* rsv 


OE 


INSQUE entry.ab, pred.ab 


* 


* 


0 


* 


5E 


REMQHI header.aq, addr.wl 


0 


* 


* 


* rsv 


5F 


REMQTI header.aq, addr.wl 


0 


* 




* rsv 


OF 


REMQUE entry.ab, addr.wl 


* 


* 


* 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


z 


V 


c 


Exceptions 


Operating System Support Instructions 


BD 


CHME param.rw, {-(ySP).w*} 


0 


0 


0 


0 




BC 


CHMK param.rw, {-(ySP).w*} 


0 


0 


0 


0 




BE 


CHMS param.rw, {-(ySP).w*} 


0 


0 


0 


0 




BF 


CHMU param.rw, {-(ySP).w*) 


0 


0 


0 


0 




06 


LDPCTX {PCB.r*, -(KSP).w*} 


- 


- 


- 


- 


rsv, prv 


DB 


MFPR procreg.rl, dstwl 


* 


* 


0 




rsv, prv 


DA 


MTPR srcrl, procreg.rl 


* 


* 


0 




rsv, prv 


OC 


PROBER mode.rb, len.rw, base.ab 


0 


* 


0 






0D 


PROBEW mode.rb, len.rw, base.ab 


0 




0 






02 


REI KSPH.r*} 


* 


* 


* 


* 


rsv 


07 


SVPCTX {(SP)+.r*, PCB.w*} 










prv 


Character String Instructions 


29 


CMPCS len.rw, srcladdr.ab, src2addr.ab 


* 


* 


0 


* 




2D 


CMPC5 srcllen.rw, srcladdr.ab, fill.rb,src21en.rw, 
src2addr.ab 


* 


* 


0 


* 




3A 


LOCC char.rb, len.rw, addr.ab 


0 


* 


0 


0 




28 


MOVC3 len.rw, srcaddr.ab, dstaddr.ab, fR0-5.wI} 


0 


1 


0 


0 




2C 


MOVC5 srclen.rw, srcaddr.ab, fill.rb, dstlen.rw, 
dstaddr.ab,{R0-5.wl) 


* 


* 


0 


* 




2A 


SCANC len.rw, addr.ab, tbladdr.ab, mask.rb 


0 


* 


0 


0 




3B 


SKPC char.rb, len.rw, addr.ab 


0 


* 


0 


0 




2B 


SPANC len.rw, addr.ab, tbladdr.ab, mask.rb 


0 


* 


0 


0 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


Z 


V 


c 


Exceptions 


Floating Point Instructions 


60 


ABDD2 add.rd, sum.md 


* 


* 


0 


0 


rsv, fov, fuv 


40 


ADDF2 add.rf, sum.mf 


•* 


* 


0 


0 


rsv, fov, fuv 


40FD 


ADDG2 add.rg, sum.mg 


* 


* 


0 


0 


rsv, fov, fuv 


61 


ADDD3 addl.rd, add2.rd, sum.wd 


* 


* 


0 


0 


rsv, fov, fuv 


41 


ADDF3 addl.rf, add2.rf, sum.wf 






0 


0 


rsv, fov, fuv 


41FD 


ADDG3 addl.rg, add2.rg, sum.wg 




* 


0 


0 


rsv, fov, fuv 


71 


CMPD srcl.rd, src2.rd 


* 




0 


0 


rsv 


51 


CMPF Brcl.rf, src2.rf 


* 


* 


0 


0 


rsv 


51FD 


CMPG srcl.rg, src2.rg 


* 


* 


0 


0 


rsv 


6C 


CVTBD srcrb, dst.wd 


* 


* 


0 


0 




4C 


CVTBF srcrb, dst.wf 


* 


* 


0 


0 




4CFD 


CVTBG srcrb, dst.wg 


* 


* 


0 


0 




68 


CVTDB srcrd, dstwb 


* 


* 


* 


0 


rsv, iov 


76 


CVTDF srcrd, dstwf 


* 


* 


0 


0 


rsv, fov 


6A 


CVTDL srcrd, dstwl 


* 


* 


* 


0 


rsv, iov 


69 


CVTDW srcrd, dstww 


* 


* 


* 


0 


rsv, iov 


48 


CVTFB srcrf; dst.wb 


* 


* 


* 


0 


rsv, iov 


56 


CVTFD srcrf, dst.wd 


* 


* 


0 


0 


rsv 


99FD 


CVTFG srcrf, dstwg 


* 


* 


0 


0 


rsv 


4A 


CVTFL srcrf, dst-wl 


* 


* 




0 


rsv, iov 


49 


CVTFW srcrf, dst.ww 




* 


* 


0 


rsv, iov 


48FD 


CVTGB srcrg, dst.wb 


* 


* 


* 


0 


rsv, iov 


33FD 


CVTGF srcrg, dstwf 




* 


0 


0 


rsv, fov, fuv 


4AFD 


CVTGL Brcrg, dst.wl 


* 


* 


* 


0 


rsv, iov 


49FD 


CVTGW srcrg, dst.ww 


* 


* 


* 


0 


rsv, iov 


6E 


CVTLD srcrl, dst.wd 




* 


0 


0 




4E 


CVTLF srcrl, dst.wf 


* 


* 


0 


0 




4EFD 


CVTLG srcrl, dst.wg 


* 


* 


0 


0 




6D 


CVTWD srcrw, dst.wd 


* 


* 


0 


0 




4D 


CVTWF srcrw, dst-wf 


* 


* 


0 


0 




4DFD 


CVTWG srcrw, dstwg 


* 


* 


0 


0 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


Z 


V 


c 


Exceptions 


Floating Point Instructions 


6B 


CVTRDL srcrd, dst.wl 


* 


* 


* 


0 


rsv, iov 


4B 


CVTRFL srcrf, dst.wl 


* 


* 


* 


0 


rsv, iov 


4BFD 


CVTRGL srcrg, dst.wl 


* 


* 


* 


0 


rsv, iov 


66 


DIVD2 divr.rd, quo.md 


* 


* 


0 


0 


rsv, fov, fuv, fdvz 


46 


DTVF2 divr.rf, quo.mf 




* 


0 


0 


rsv, fov, fuv, fdvz 


46FD 


DIVG2 divr.rg, quo.mg 


* 


* 


0 


0 


rsv, fov, fuv, fdvz 


67 


DIVD3 divr.rd, divd.rd, quo.wd 


* 


* 


0 


0 


rsv, fov, fuv, fdvz 


47 


DIVF3 divr.rf, divd.rf, quo.wf 


* 




0 


0 


rsv, fov, fuv, fdvz 


47FD 


DIVG3 divr.rg, divd.rg, quo.wg 


* 




0 


0 


rsv, fov, fuv, fdvz 


72 


MNEGD srcrd, dst.wd 


* 


* 


0 


0 


rsv 


52 


MNEGF srcrf, dstwf 


* 


* 


0 


0 


rsv 


52FD 


MNEGG srcrg, dst.wg 


* 




0 


0 


rsv 


70, 


MOVD srcrd, dst.wd 


* 


* 


0 


" 


rsv 


50 


MOVF srcrf, dst.wf 




* 


0 




rsv 


50FD 


MOVG srcrg, dst.wg 


* 




0 


- 


rsv 


64 


MULD2 mulr.rd, prod.md 


* 




0 


0 


rsv, fov, fuv 


44 


MULF2 mulr.rf, prod.mf 


* 




0 


0 


rsv, fov, fuv 


44FD 


MULG2 mulr.rg, prod.mg 


* 


* 


0 


0 


rsv, fov, fuv 


65 


MULD3 mulr.rd, muld.rd, prod.wd 


* 


* 


0 


0 


rsv, fov, fuv 


45 


MULF3 mulr.rf, muld.rf, prod.wf 


* 


* 


0 


0 


rsv, fov, fuv 


45FD 


MULG3 mulr.rg, muld.rg, prod.wg 


* 


* 


0 


0 


rsv, fov, fuv 


62 


SUBD2 sub.rd, dif.md 


* 


* 


0 


0 


rsv, fov, fuv 


42 


SUBF2 sub.rf, dif.mf 


* 


* 


0 


0 


rsv, fov, fuv 


42FD 


SUBG2 sub.rg, dif.mg 


* 


* 


0 


0 


rsv, fov, fuv 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


Z 


V 


c 


Exceptions 


Floating Point Instructions 


63 


SUBD3 Bub.rd, min.rd, dif.wd 


* 


* 


0 


0 


rsv, fov, fuv 


43 


SUBF3 Bub.rf, min.rf, dif.wf 


* 


* 


0 


0 


rsv, fov, fuv 


43FD 


SUBG3 sub.rg, min.rg, dif.wg 


* 


* 


0 


0 


rsv, fov, fuv 


73 


TSTD src.rd 


* 




0 


0 


rsv 


53 


TSTF src.rf 


* 


* 


0 


0 


rsv 


53FD 


TSTG src.rg 


* 


* 


0 


0 


rsv 


Microcode- Assisted Emulated Instructions 




ADDP4 addlen.rw, addaddr.ab, sumlen.rw, 
sumaddr.ab 


* 


* 


* 




rsv, dov 


21 


ADDP6 addllen.rw, addladdr.ab, add21en.rw, 
add2addr.ab, sumlen.rw, sumaddr.ab 


* 


* 


* 


0 


rsv, dov 


F8 


ASHP cnt.rb, srclen.rw, srcaddr.ab, rotmd.rb, 
dstlen.rw, dstaddr.ab 


* 


* 




0 


rsv, dov 


35 


CMPP3 len.rw, srcladdr.ab, src2addr.ab 


* 


* 


0 


0 




37 


CMPP4 srcllen.rw, srcladdr.ab, src21en.rw, 
src2addr.ab 


* 


* 


0 


0 




OB 


CRC tbl.ab, inicrc.rl, strlen.rw, stream.ab 


* 


* 


0 


0 




F9 


CVTLP srcrl, dstlen.rw, dstaddr.ab 


* 




* 


0 


rsv, dov 


36 


CVTPL srclen.rw, srcaddr.ab, dstwl 


* 


* 


* 


0 


rsv, iov 


08 


CVTPS srclen.rw, srcaddr.ab, dstlen.rw, dstaddr.ab 


* 


* 


• * 


0 


rsv, dov 


09 


CVTSP srclen.rw, srcaddr.ab, dstlen.rw, dstaddr.ab 


* 


* 


* 


0 


rsv, dov 


24 


CVTPT srclen.rw, srcaddr.ab, tbladdr.ab, dstlen.rw, 
dstaddr.ab 




* 


* 


0 


rsv, dov 


26 


CVTTP srclen.rw, srcaddr.ab, tbladdr.ab, dstlen.rw, 
dstaddr.ab 


* 


* 


* 


0 


rsv, dov 


27 


DIVP divrlen.rw, divraddr.ab, divdlen.rw, div- 


* 


* 


* 


0 


rsv, dov, ddvz 



daddr.ab, quolen.rw, quoaddr.ab 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode Instruction N Z V C Exceptions 



Microcode-Assisted Emulated Instructions 



38 ' EDITPC srclen.rw, srcaddr.ab, pattern.ab, * * * * rsv, dov 

dstaddr.ab 

39 MATCHC objlen.rw, objaddr.ab, srclen.rw, sr- 0 * 0 0 
caddr.ab 

34 MO VP len.rw, srcaddr.ab, dstaddr.ab * * 0 0 

2E MOVTC srclen.rw, srcaddr.ab, fiU.rb, tbladdr.ab, * * 0 ' * 

dstlen.rw, dstaddr.ab 

2F MOVTUC srclen.rw, srcaddr.ab, esc.rb, tbladdr.ab, * * ■ * * 

dstlen.rw, dstaddr.ab 

25 MULP mulrlen.rw, mulraddr.ab, muldlen.rw, * * * 0 rsv, dov 

muldaddr.ab, prodlenrw, prodaddr.ab 

22 SUBP4 sublen.rw, subaddr.ab, difien.rw, difaddr.ab * * * 0 rsv, dov 

23 SUBP6 sublen.rw, subaddr.ab, minlen.rw, mi- * * * 0 rsv, dov 

nadd-rah, ^iflpn.rw ^ifaHrir.ab 
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The notation used for operand specifiers is <name>.<ac.ce86 typexdata type>. Implied operands (those locations that are 
referenced by the instruction but not specified by an operand) are denoted by curly braces {). 

Access Type 

a = address operand 

b = branch displacement 

m = modified operand (both read and written) 

r = read only operand 

v = if not "Rn", same as a, otherwise R[n+l]H[n] 
w = write only operand 

Data Type 

b = byte 
d = D_fioating 
f = F_floating 
g = G_floating 
1 - longword 
q = quadword 

t = field (used only in implied operands) 
w = word 

* = multiple long-words (used only in implied operands) 
Condition Codes Modification 

* = conditionally set/cleared 
- = not affected 

0 = cleared 

1 s set 

Exceptions 

rsv = reserved operand fault 

iov = integer overflow trap 

idvz = integer divide by zero trap 

fov a floating overflow fault 

fuv s floating underflow fault 

fdvz = floating divide by zero fault 

dov b decimal overflow trap 

ddvz s decimal divide by zero trap 

sub = subscript range trap 

prv = privileged instruction fault 

vec s vector unit disabled fault 
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2.6 Memory Management 

The NVAX Plus CPU Chip supports a four gigabyte (2**32) virtual address space, divided into 
two sections, system space and process space. Process space is further subdivided into the P0 
region and the PI region. 



2.6.1 Memory Management Control Registers 

Memory management is controlled by three processor registers: Memory Management Enable 
CMAPEN), Translation Buffer Invalidate Single (TBIS), and Translation Buffer Invalidate All 
(TBIA). 

Bit <0> of the MAPEN register enables memory management if written with a 1 and disables 
memory management if written with a 0. The MAPEN register is shown in Figure 2—11. 



Figure 2-1 1 : MAPEN Register 



3: 30 29 28I27 26 25 24 123 22 21 20119 IB 17 16 | 15 14 13 12 111 10 09 06107 06 05 04 103 02 01 00 



100 0 0000 0 000000000 0 0 000000 0 00 0 0 01 I :MWEN 



I 

MME — 4 



The TBIS register controls translation buffer invalidation. Writing a virtual address into TBIS in- 
validates any entry which maps that virtual address. The TBIS format is shown in Figure 2—12. 

Figure 2-12: TBIS Register 



31 30 29 28127 26 25 24123 22 21 20119 16 17 16115 14 13 12 111 10 09 08|07 06 05 04 | 03 02 01 00 
I Virtual Address i :TBIS 



The TBIA register also controls translation buffer invalidation. Writing a zero into TBIA invali 
dates the entire translation buffer. The TBIA format is shown in Figure 2—13. 



Figure 2-13: TBIA Register 



31 30 29 28127 26 25 24 123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04 | 03 02 01 00 



10000000000000000000000000000000 01 :TEIA 
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2.6.2 System Space Address Translation 

A virtual address with bits <31> = 1 is an address in the system virtual address space. 

System virtual address space is mapped by the System Page Table (SPT), which is defined by 
the System Base Register (SBR) and the System Length Register (SLR). The SBR contains the 
page-aligned physical address of the System Page Table. The SLR contains the size of the SPT 
in longwords, that is, the number of Page Table Entries. The Page Table Entry addressed by the 
System Base Register maps the first page of system virtual address space, that is, virtual byte 
. address 80000000 (hex). These registers are shown in Figure 2-14. 

With a 22-bit SLR 2**22-1 pages in system space may be addressed. As a result, the last page 
of system space (beginning at virtual address FFFFFE00 (hex)) is not addressable. As a result, 
this page is reserved and a reference to any address in that page will result in a length violation. 

NOTE 

The extended SO space descibed above is implemented on the NVAX Plus chip. 

NOTE 

When the CPU is configured to generate 30-bit physical addresses, SBR<31:30> are 
ignored. 



Figure 2-14: System Base and Length Registers 
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The system space translation algorithm is shown graphically in Figure 2-15. 
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2.6.3 Process Space Address Translation 

A virtual address with bit <31> = 0 is an address in the process virtual address space. Process 
space is divided into two equal sized, separately mapped regions. If virtual address bit <30> = 0, 
the address is in region P0. If virtual address bit <30> = 1, the address is in region PI. 



2.6.3.1 P0 Region Address Translation 

The P0 region of the address space is mapped by the P0 Page Table (POPT), which is denned by 
the P0 Base Register (POBR) and the P0 Length Register (POLR). The POBR contains the system 
page-aligned virtual address of the P0 Page Table. The POLR contains the size of the POPT in 
longwords, that is, the number of Page Table Entries. The Page Table Entry addressed by the P0 
Base Register maps the first page of the P0 region of the virtual address space, that is, virtual 
byte address 0. The P0 base and length registers are shown in Figure 2-16. 
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The P0 space translation algorithm is shown graphically in Figure 2-17. 
Figure 2-16: P0 Base and Length Registers 
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Figure 2-17: PO Space Translation Algorithm 
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2.6.3.2 P1 Region Address Translation 

The PI region of the address space is mapped by the PI Page Table CP1PT), which is defined 
by the PI Base Register (P1BR) and the PI Length Register (P1LR). Because PI space grows 
towards smaller addresses, and because a consistent hardware interpretation of the base and 
length registers is desirable, P1BR and P1LR describe the portion of PI space that is NOT 
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accessible. Note that P1LR contains the number of nonexistent PTEs. P1BR contains the page- 
aligned virtual address of what would be the PTE for the first page of PI, that is, virtual byte 
address 40000000 (hex). The address in P1BR is not necessarily an address in system space,but 
all the addresses of PTEs must be in system space. 

The PI space translation algorithm is shown graphically in Figure 2—19. 



Figure 2-18: Pi Base and Length Registers 
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Figure 2-19: P1 Space Translation Algorithm 
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2.6.4 Page Table Entry 

If the CPU is configured to generate 30-bit physical addresses, it interprets PTEs in the 21- 
bit PFN format shown in Figure 2—20. Conversely, if the CPU is configured to generate 32-bit 
physical addresses, it interprets PTEs in the 25-bit PFN format shown in Figure 2-21. Note that 
bits <24:23> of the 25-bit PFN format are ignored by the NVAX Plus CPU chip, which implements 
only 32-bit physical addresses. The PTE formats shown below are described in DEC Standard 
032. 



Figure 2-20: PTE Format (21-blt PFN) 
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Figure 2-21 : 


PTE Format (25-bit PFN) 
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Table 2-7: PTE Protection Code Access Matrix 

Code Current Mode 



Decimal Binary Mnemonic K E S U Comment 



0 


0000 


NA 


- 






— no access 


1 


0001 






unpredictable 


reserved 


2 


0010 


KW 


RW 


— 


— 


- 


3 


0011 


KR 


R 






- 


4 


0100 


UW 


RW 


RW 


'HIV 

KW 


RW all access 


5 


0101 


EW 


RW 


RW 




— 


6 


0110 


ERKW 


RW 


R 


ye - . ! ■ • ■ - 




7 


0111 


ER 


R 


R 






8 


1000 


SW 


RW 


RW 


RW 




9 


1001 


SKEW 


RW 


RW 


R 




10 


1010 


SRKW 


RW 


R 


R 




11 


1011 


SR 


R 


R 


R 




12 


1100 


URSW 


RW 


RW 


RW 


R 


13 


1101 


UREW 


RW 


RW 


R 


R 


14 


1110 


URKW 


RW 


R 


R 


R 


15 


1111 


UR 


R 


R 


R 


R 



Access Modes 

K = Kernel 
E = Executive 
S = Supervisor 
U = User 



Access Types 

R = Read 
W = Write 
- = No access 



2.6.5 Translation Buffer 

In order to save actual memory references when repeatedly referencing pages, the NVAX Plus 
CPU Chip uses a translation buffer to remember successful virtual address translations and page 
status. The translation buffer contains 96 fully associative entries. Both system and process 
references share these entries. 

Translation buffer entries are replaced using a not-last-used (NLU) algorithm. This algorithm 
guarantees that the replacement pointer is not pointing at the last translation buffer entry to be 
used. This is accomplished by rotating the replacement pointer to the next sequential translation 
. buffer entry if it is pointing to an entry that has just been accessed. Both D-stream and I-stream 
references can cause the NLU to cycle. When the translation buffer does not contain a reference's 
virtual address and page status, the machine updates the translation buffer by replacing the 
entry that is selected by the replacement pointer. 
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2.7 Exceptions and Interrupts 

At certain times during the operation of a system, events within the system require the execution 
of software routines outside the explicit flow of control of instruction execution. An exception is 
an event that is relevant primarily to the currently executing process and normally invokes a 
software routine in the context of the current process. An interrupt is an event which is usually 
due to some activity outside the current process and invokes a software routine outside the context 
of the current process. 

Exceptions and interrupts are reported by constructing a frame on the stack and then dispatching 
to the service routine through an event-specific vector in the System Control Block (SCB). The 
minimum stack frame for any interrupt or exception is a PC/PSL pair as shown in Figure 2—22. 

Figure 2-22: Minimum Exception Stack Frame 



31 30 29 26 1 2 ~ 26 25 24 122 22 21 20119 16 17 16115 14 13 12 111 10 09 08 10"! 06 05 04 103 02 01 00 

PC |- : (SP) 

!— 1 h~ — - + + + — + __ + __ + __ + __i__*_„ + __^__^.__ + __ + __ + „„ T __ J .__»__ + _ y 

PSl I 



This minimum stack frame is used for all interrupts. Certain exceptions expand the stack frame 
by pushing additional parameters on the stack above the PC/PSL pair as shown in Figure 2-23. 

Figure 2-23: General Exception Stack Frame 
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What parameters, if any, are pushed on the stack above the PC/PSL pair is a function of the 
specific exception being reported. 

2.7.1 Interrupts 

DEC Standard 032 defines 31 interrupt priority levels, a subset of which is implemented by the 
NVAX Plus CPU. When an interrupt request is generated, the hardware compares the request 
with the current IPL of the CPU. If the new request is of higher priority an internal request is gen- 
erated. At the completion of the current instruction (or at selected points during the execution of 
interruptible instructions), a microcode interrupt handler is invoked to process the request. With 
hardware assistance, the microcode handler determines the highest priority interrupt, updates 
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the IPL, pushes a PC/PSL pair on the stack, and dispatches to a macrocode interrupt handler 
through the appropriate location in the SCB. 

Of the 31 interrupt priority levels denned by DEC Standard 032, the NVAX Plus CPU makes use 
of 23 of them, as shown in Table 2—8. 



Table 2-8: 


Interrupt Priority Levels 




IPL (hex) 


IPL (decimal) 


Interrupt Condition 


IF 


31 


hajltjb asserted (non maskable) 


IE 


30 


TJnuRed 

ill UDwU 


ID 


29 


krr.e asserted (or internal hard error detected) 


1C 


28 


Unused 


IB 


27 


Performance Monitoring InterruptCinternally handled by microcode 


1A 


26 


Internal soft error detected 


1&-19 


24-25 


Unused 


17 


23 


irq_b<3> asserted 


16 


22 


irq_h<2> or interval timer (ibq_h<2> takes priority) 


15 


21 


irq_h<1> asserted 


14 


20 


nto_B<0> asserted 


10-13 


16-19 


Unused 


01-OF 


01-15 


Software interrupt asserted 



2.7.1 .1 Interrupt Control Registers 

The interrupt system is controlled by three processor registers: the Interrupt Priority Level 
Register (IPL), the Software Interrupt Request Register (SIRR), and the Software Interrupt 
Summary Register (SISR). 

A new interrupt priority level may be loaded into PSL<20:16> by writing the new value to 
IPL<4:0>. The IPL register is shown in Figure 2-24, 

Figure 2-24: Interrupt Priority Level Register 



31 30 29 28 1 27 26 25 24I23 22 21 20 1 IS 16 17 16|15 14 13 12111 10 0& 06107 06 05 04 103 02 01 00 
jOOOOO 0 000000 0 000000 0 0 00000 0|PSL<20:16>| : IPX. 
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A software interrupt may be requested by writing the desired level to SIRR<3:0>. The SIRR 
register is shown in Figure 2-25. 

Figure 2-25: Software Interrupt Request Registers 



il 30 26 28127 26 25 24123 22 21 20116 16 17 16 I IE 14 13 12111 10 06 08107 06 05 04 103 02 01 00 
0 0 0 0000 0 0000 0 00000000000000 OIReauest IPL I :SXRR 



The SISR register records pending software interrupt requests at levels 01 through OF (hex). The 
SISR register is shown in Figure 2—26. 

Figure 2-26: Software Interrupt Summary Register 
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IPX 15 request — ' 
IPL 14 reauest 



IPL 2 request — ' I 
IPL 1 reauest — ' 



2.7.2 Exceptions 

The VAX architecture recognizes six classes of exceptions. Table 2-9 lists instances of exceptions 
in each class. 



Table 2-9: Exception Classes 



Exception Class 



Instances 



Arithmetic traps/faults 



Memory management exceptions 



Operand reference exceptions 



Integer overflow trap 
Integer divide-by-zero trap 
Subscript range trap 
Floating overflow fault 
Floating divide-by-zero fault 
Moating underflow fault 

Access control violation fault 
Translation not valid fault 
M=0 fault 

Reserved addressing mode fault 
Reserved operand fault or abort 



2—34 Architectural Summary 



DIGITAL CONFIDENTIAL 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 



Table 2-9 (Cont.); Exception Passes 



Exception Class 



Instances 



Instruction execution exceptions 



Reserved/privileged instruction fault 
Emulated instruction faults. 
XPC fault 
Change-mode trap 
Breakpoint fault 
Vector disabled fault 



Tracing exceptions 
System failure exceptions 



Trace fault 



Kernel-stack-not-valid abort 
Interrupt- stack-not-valid halt 
Console error halt 
Machine check abort 



A trap is an exception that occurs at the end of the instruction that caused the exception. 
Therefore, the PC saved on the stack is the address of the next instruction that would normally 
have been executed. 

A fault is an exception that occurs during an instruction and that leaves the registers and memory 
in a consistent state such that elimination of the fault condition and restarting the instruction 
will give correct results. After the instruction faults, the PC saved on the stack points to the 
instruction that faulted. 

An abort is an exception that occurs during an instruction. An abort leaves the value of regis- 
ters and memory UNPREDICTABLE such that the instruction cannot necessarily be correctly 
restarted, completed, simulated, or undone. In most instances, the NVAX Plus microcode at- 
tempts to convert an abort into a fault by restoring the state that was present at the start of the 
instruction which caused the abort. 

The following sections describe only those exceptions which are unique to the NVAX Plus CPU, 
or where DEC Standard 032 is not clear about the implementation. 

2.7.2.1 Arithmetic Exceptions 

Arithmetic exceptions are detected during the execution of instructions that perform integer or 
floating point arithmetic manipulations. Whether the exception is reported as a trap or a fault 
is a function of the specific event. In any case, the exception is reported through SCB vector 34 
(hex) with the stack frame shown in" Figure 2— 27. Table 2—10 lists the exceptions reported by 
this mechanism. 
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Figure 2-27: Arithmetic Exception Stack Frame 



32 30 2<i 28127 26 21 24123 22 21 20119 18 17 16 1 15 14 13 12 111 10 0? 08107 06 05 04 102 02 01 00 
I Type Code I : (SP) 

K~ -i 4.— H 4- 4.--*--4-«-+--4---*~4---4--»*--^»+»-+--T»»+--+--4'--H---+--4---+--* (—-+— +--+--+ 

I PC I 

h-----4~-4---+--4---4---4~-4---4— -4---4---4---4---^ 

I PSL I 



Table 2-10: Arithmetic Exceptions 

Type Code 



Decimal Hex Type Exception 

1 1 Trap Integer overflow 

2 2 Trap Integer divide-by-zero 

7 7 Trap Subscript range 

8 8 Fault Floating overflow 

9 9 Fault Floating divide-by-zero 
10. A Fault Floating underflow 



2.7.2.2 Memory Management Exceptions 

Memory management exceptions are detected during a memory reference and are always reported 
as faults. The five memory management exceptions are listed in Table 2-11. All four exceptions 
push the same frame on the stack, as shown in Figure 2-28. The top longword of the stack frame 
contains a fault parameter whose bits are described in Table 2-12. 



Tabie 2-11: Memory Management Exceptions 



SCB Vector 


Exception 


20 (hex) 


Access control violation 


24 (hex) 


Translation not valid 


3C (hex) 


Modify fault 
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Figure 2-28: Memory Management Exception Stack Frame 



31 30 26 28127 26 25 24 123 22 21 20116 16 1"? 16 1 15 14 13 12 111 10 06 08 1 07 06 05 04 103 02 01 00 

|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| 01 01 Ml PI LI : (SP) 

I Some Virtual Address in the Faulting Page I 

; PC i 

I PSL I 



Table 2-12: Memory Management Exception Fault Parameter 


Bit Mnemonic 


Meaning 


0 L 


Length violation 


1 P 


PTE reference 


2 M 


Modify or write intent 



2.7.2.3 Emulated Instruction Exceptions 

The NVAX Plus CPU implements the VAX base instruction group. For certain instructions outside 
that group, the NVAX Plus microcode provides support for the macrocode emulation of instruc- 
tions. There are two types of emulation exceptions, depending on whether PSL<FPD> is set at 
the beginning of the instruction. 

If PSL<FPD>sr0 at the beginning of the instruction, the exception is reported through SCB vector 
C8 (hex) as a trap with the stack frame shown in Figure 2-29. The longwords in the stack frame 
are described in Table 2-13. 
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Figure 2-29: 


Instruction Emulation Trap Stack Frame 
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1 
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Table 2-13: 


Instruction Emulation Trap Stack Frame 




Location 


Use 




Opcode 


Zero-extended opcode of the emulated instruction 




Old PC 


PC of the opcode of the emulated instruction 




Specifiers 


Address of the specified operand for specifiers of access type write (.wx) or address 
(.ax). Operand value for specifiers of access type read (.rx). For read-type operands 
whose size is smaller than a longword. the remaining bits are UNPREDICTABLE. 
For those instructions that don't have 8 specifiers, the remaining specifier longwords 
contain UNPREDICTABLE values 


New PC 


PC of the instruction following the emulated instruction 




PSL 


PSL Baved at the time of the trap 





If PSL<FPD>=1 at the beginning of the instruction, the exception is reported through SCB vector 
CC (hex) as a fault with the stack frame shown in Figure 2-30. In this case, PC is that of the 
opcode of the emulated instruction. 
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Figure 2-30: Suspended Emulation Fault Stack Frame 



31 30 26 28)27 26 25 24 123 22 21 20 1 16 18 17 16115 14 13 12 111 10 06 08107 06 05 04 103 02 01 00 
I PC ! : <SP) 

I PSL I 



2.7.2.4 Machine Check Exceptions 

A machine check exception is reported through SCB vector 04 (hex) when the NVAX Plus CPU 
detects an error condition. The frame pushed on the stack for a machine check indicates the type 
of error and provides internal state information that may help identify the cause of the error. 
The generic machine check stack frame is shown in Figure 2-31. 

Figure 2-31 : Generic Machine Check Stack Frame 



31 30 26 28127 26 25 24123 22 21 2011? 16 17 16 | 15 14 13 12 111 10 06 08107 06 05 04 103 02 01 00 

r-'-*--+--»--+--^--^--+--+--*--+--+--*--+--+--+--4~-+--*--+--+--+--+--+---l---+--+---l---+--+--.+ 

I Byte Count of Parameters, Excluding This Longword j : (SP) 



PC I 

..4.--H-— 4-— +—-+—»— — f— +-«+--+--+ 

PSl I 



2.7.2.5 Console Halts 

In certain microcode flows, the NVAX Plus microcode may detect an inconsistency in internal 
state, a kernel-mode HALT, or a system reset. In these instances, the microcode initiates a 
hardware restart sequence which passes control to the console program. 

***When a hardware restart sequence is initiated, the NVAX Plus microcode saves the current 
CPU state, partially initializes the CPU, and passes control to the console program at the physical 
address contained in the CONSOLE JEtEG register. *** 

During a hardware restart sequence, the stack pointer is saved in the appropriate stack pointer 
IPR (0 through 4), the current PC is saved in IPR 42 (SAVPC), and the current PSL, halt code, 
and validity flag are saved in IPR 43 (SAVPSL). The format of SAVPC and SAVPSL are shown 
in Figure 2-32. 
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Figure 2-32: Console Saved PC and Saved PSL 



31 30 26 28127 26 25 24123 22 21 20116 18 17 16115 14 13 12111 10 06 08107 06 05 04 103 02 01 00 

k 4— ^ — k— 4— 4— -4— 4.— 4— -4— .4.— + — 4— —-—.4— -4— 4.-— —4.— 4— 4— 4-— —4—4— 4— H h 

I Savad PC I :SAVPC 

| 4— 4— + »•>—+— +—+ — 4— 4— + —4—4 — 4— 4"4— 4 — 4— 4— 4— 4—4— 4— H 4—4 — 4— 4 — 4—4—+" + 

31 30 26 28127 26 25 24 123 22 21 20116 18 17 16115 14 13 12111 10 OS 08107 06 05 04 103 02 01 00 

t __4.__+__ + __^__ + _- + __ + __ + _-+__+-« + »_4._-4.--4— - 4"4— 4— 4— 4— 4— 4— 4— -4— 4— 4--4— 4— 4"4— 4— 4" + 

I PSL<31:16> I I | Halt Code 1 PSL<7:0> I :SAVPSL 

k— 4™ 4. 4.--4-.-.^— 4— -4— 4--4— .4— 4— — -4— 4— -— -4~—— 4— ^-*— 4— — -4— 4 — 4— 4 — 4— 4 — 4-— 4— 4— + 

i I 

MAPEN<0> — ' I 
Invalid SAVPSL if 1 — ' 



2.B System Control Block 

The System Control Block (SCB) is a page containing the vectors for servicing interrupts and 
exceptions. The SCB is pointed to by the System Control Block Base Register (SCBB), whose 
format is shown in Figure 2-33. For best performance, SCBB should contain a page-aligned 
address. Microcode forces a longword -aligned SCBB by clearing bits <1:0> of the new value 
before loading the register. 

NOTE 

When the CPU is configured to generate 30-bit physical addresses, SCBB<31:30> are 
ignored. 



Figure 2-33: System Control Block Base Register 



31 30 26 2812' 26 25 24123 22 21 20116 16 17 16115 14 13 12111 10 09 08|07 06 05 04 | 03 02 01 00 

.*__4.— 4— 4— — 4"4— 4— 4— +"-— +~4— +— 4— +— +"+— 4— +— +— +— +" 4—4— 4—4--+-- + 

I Physical Page Address of SCB I SBZ 10 0 1 :SCBB 

+ » 4. ,. . 4- 4—4 4—4. 4—4—4—4 4—4 4—4—4—4—4—4 4—4—4 4—4—4—4—4—4—+ + 



2.8.1 System Control Block Vectors 

An SCB vector is an aligned longword in the SCB through which the NVAX Plus microcode 
dispatches interrupts and exceptions. Each SCB vector has the format shown in Figure 2-34. 
The fields of the vector are described in Table 2-14. 
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Figure 2-34: System Control Block Vector 



31 30 2& 26127 26 25 24123 22 21 20 IIP 16 17 16 1 15 14 13 12 1 11 10 OS 08|07 06 05 04 103 02 01 00 

+_-+__+_-+- - __4— -.4.-.-+-- 4.-..*—+-— +- 

I longword address of service routine I code I 



Table 2-14: System Control Block Vector 



Bits Contents 



31:2 Virtual address of the service routine for the interrupt or exception. The routine must be 

longword aligned, as the microcode forces the lower two bits of the address to 00 

1:0 Code, interpreted as follows: 



Value Meaning 



00 The event is to be serviced on the kernel stack unless the CPU is already on the 
interrupt stack, in which case the event is serviced on the interrupt stack 

01 The event is to be serviced on the interrupt stack. If the event is an exception, the 
3PL is raiBed to IF (hex) 

10 Unimplemented, results in a console error halt 

11 Unimplemented, results in a console error halt 



2.8.2 System Control Block Layout 

The System Control Block layout is shown in Table 2—15. 



Table 2-15: System Control Block Layout 



Vector 



Name 



Type 



Par am Notes 



00 
04 

08 
OC 
10 
14 
18 
1C 
20 

24 



unused - 

machine check abort 

kernel stack not valid abort 

unused - 

reserved/privileged instruction fault 

customer reserved instruction fault 

reserved operand fault/abort 

reserved addressing mode fault 

access control violation/vector fault 
alignment fault 

translation not valid fault 



**NVAX passive release** 

parameters reflect machine state; 
must be serviced on interrupt stack 

must be serviced on interrupt stack 

**NVAX power fail** 

XPC instruction 

not always recoverable 



parameters are virtual address, 
status code 

parameters are virtual address, 
status code 
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Table 2-15 (Cont.): System Control Block Layout 



Vector 



Name 



Type 



Par am Notes 



28 
2C 
30 
34 

38-3C 
40 

44 

48 

4C 

50 
54 
58 

59-5C 
60 
64 
68 

6C-80 

84 
88 
8C 

90-BC 
CO 
C4 
C8 

CC 

DO 
D4 



trace pending 
breakpoint instruction 
unused 

arithmetic trap/fault 
unused 
CHMK 

CHME 

CHMS 

CHMU 

unused - 

soft error notification interrupt 

Performance monitoring counter interrupt 
overflow 



fault 
fault 

trap/fault 

trap 
trap 
trap 
trap 



unused 

hard error notification 
unused 

vector unit disabled 
unused 

software level 1 
software level 2 
software level 3 

software levels 4—15 
interval timer 
unused 

emulation start 

emulation continue 

device vector 
device vector 



interrupt 
fault 

interrupt 
interrupt 
interrupt 

interrupt 
interrupt 

fault 

fault 

interrupt 
interrupt 



0 
0 
0 

0 
0 

10 



compatibility mode in other VAXes 
parameter is type code 



parameter is sign-extended operand 
word 

parameter is sign-extended operand 
word 

parameter is sign-extended operand 
word 

parameter is sign-extended operand 
word 



IPL is 1A (hex) - 

See Chapter 18 for details 

IPL is ID (hex) 

vector instructions 

**80 was NVAXinterprocesBor in- 
terrupt** 

ordinarily used for AST delivery 

ordinarily used for process schedul- 
ing 

IPL is 16 (hex) 

same mode exception, FPD=0; pa- 
rameters are opcode, PC, speci- 
fiers 

same mode exception, FPD=1; no 
parameters 

IPL is 14 (hex) 

IPL is 15 (hex), includes console 
interrupts 
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Table 2-15 (Cont.): System Control Block Layout 






Vector 


Name 


Type 


Par am 


Notes 


D8 


device vector 


interrupt 


0 


IPL is 16 (hex)., includes inter- 
processor interrupts 


DC 


device vector 


interrupt 


0 


IPL is 17 Chex) 


E0-P4 


unused 








F8-FC 


unused 






**F8 was NVAX console receiver- 
FC was console transmitter -IPL 
15** 


100-FFFC 


unused 






**was NVAX Device interrupt vec- 
tors** 



2.9 CPU Identification 

Software may quickly determine on which CPU it is executing in a multi-processor system by 
reading the CPUID processor register. The format of this register is shown in Figure 2—35. 

Figure 2-35: CPU ID Register 



31 30 29 2612"? 26 25 24123 22 21 20119 16 17 16115 14 13 12 111 10 09 08107 06 05 04 103 02 01 00 
100000 0 0000000000000000001 CPU Identification | :CPUID 



The CPUID processor register is implemented internally as an 8-bit read-write register. The 
source of the CPU ID information is system-specinc, and it is the responsibility of the console 
firmware at powerup to determine the CPU ID from the system-specinc source, and write the 
CPU ID register to the correct value. 



2.10 SYSTEM IDENTIFICATION 

The System Identification Register, IPR 62 (SED), is a read-only register implemented per DEC 
Standard 032 in the NVAX Plus CPU. This 32-bit register is used to identify the processor type 
and its microcode revision level. 
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Figure 2-36: System Identification (SID) 



21 30 2fr 26 27 26 25 24 23 22 21 20 19 16 17 16 IE H 13 12 11 1C 9 6 7 6 5 4 3 2 1 0 

RC | 0 00000001 RO I RO ! RO I :SID 



■*■ — > Microcode revision 
> N£ 

--— > Parch Revision 
> CPU type 
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Table 2-16: SID Field Descriptions 



Name Extent Type Description 



Microcode Revision 


7:0 


RO 


This field contains the microcode 

V \~X±XfJ J I XdI\JJL± XI U 1, 1 J UCL. JL iiifi XJ.U.IIJ* 

ber is incremented for each pass of 
the chip. 


NS 


8 


RO,0 


If this bit is a zero, there is ei- 
ther no microcode patch loaded, ot 
the patch is a standard patch. If 

wXliO UAL' XS> «1 vXiC > <SL XI\JL1~b LAllUCLI il 

microcode patch is loaded. A non- 
standard patch is one which goes 
beyond the formally released patches 
such as a patch used for perfor- 
mance analysis. This bit is cleared 
on chip reset. 


Patch Revision 


13:9 


RO,0 


If this field is zero, no microcode 
patch is loaded. If this field is non- 
zero, a microcode patch is loaded 
and this field indicates the patch 
number. This field is cleared on 
chip reBet. 


CPU Type 


31:24 


RO 


This field contains. 23 (decimal), in- 
dicating that this is an NVAX Plus 
CPU. 



In order to distinguish between different CPU implementations that use the same CPU chip, the 
LNP, along with all VAX processors which use the NVAX Plus chip, implements a System Type 
Register (SYSJIYPE). SYS_TYPE resides at the physical address pointed to by the CONSOLE^ 
REG + 4. This 32-bit read-only register is implemented in the LNP console image. The format 
of this register is shown in Figure 2—37. 



Figure 2-37: System Type (SYS_TYPE) 



31 30 29 28 27 26 25 24 22 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 

RO | RO | RO I RO | :SYS TYPE 



I. 

+-> Architectural ID 
— > System Variant 
— > Revision level 
- — > System type 



The fields in this register are as follows: 
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Architectural ID: This field contains licensing bits which distinguish timesharing systems from 
workstations. Because the LNP module is included in a timesharing system, this field contains 
01 (hex). 

System Variant: This field distinguishes variants of similar systems. Because this is the first 
LNP variant, this field contains 01 (hex). 

Revision level: This field contains the revision number of the LNP console software. The first 
LNP console revision will be 01 (hex). 

System type: This field indicates the type of system. Because this is a Laser system, this field 
contains TBD (hex). 

SID and SYS_TYPE are accessible only to the CPU on the LNP module. Other devices on the 
LSB determine the type of node by reading its Laser Device Registers (LDEV). 

2.11 Process Structure 

A process is a single thread of execution. The context of the current process is contained in the 
Process Control Block (PCB). The PCB is pointed to by the Process Control Block Base register 
(PCBB), which is shown in Figure 2-38. The format of the process control block is shown in 
Figure 2-39. Microcode forces a longword-aligned PCBB by clearing bits <1:0> of the new value 
before loading the register. 

NOTE 

When the CPU is configured to generate 30-bit physical addresses, PCBB<31:30> are 
ignored. 

Figure 2-38: Process Control Block Base Register 



31 30 29 2€ 25 24 123 22 21 20119 a6 1"? 16 1 15 14 13 12111 10 OS 06 107 06 05 04 1 03 02 01 00 

Physical Longword AcSdr«ss of the PCE 10 0| :PCBE 
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Figure 2-39: Process Control Block 



31 30 29 2812"? 26 25 24 1 23 22 21 20119 18 17 16115 14 13 12 1 11 10 OS 08107 06 05 04 102 02 01 00 
I KSF . I :PCE 

I ESP |+4 

I SSP I +8 

t. + + + 4. +— + + +-- +— H + +. +--+ 4—+—+ + + +~ + "I +—• I + h ■+ * + 

I USP I 412 

^-.-+--4.--+--4---+--+--+--+--+--4---+--+--+--+--4---+--+--*--+--+--+ 

i R0 I +16 

f- — I— — (—-+---.--+--+--4-— -+—-(—-4-—+— +—*—-+- — h~4-— .+--+--+•—.+--+— .+-- 4— -4~ — 1~-4— 4— -4— 4 1 h— -» 

I R2 +20 

| 4._-4---4---4.--4---+--4.--4.--4~-4---4---+--+--4---+--4---+--+--+ 

I R2 1+2 4 



R3 |+28 

--+--+— +— -4— 4— -+-■ -+--4— 4— -+--+-- +— +~4— «H — -4~-+ 



R4 | +32 



I R5 1+3 6 

+=-•+--+--+-- ^--+-- +--+--+--+--+--+--+-- +--+--+--+--+-.-+--+-...+— -4— —4—-+— 4~-+- -4—+--+— -+--+--+—-+ 

I R6 1+4 0 



R7 ! +44 

__4-__4---+-- + --+--4---+--4---+--+--T--T--+"*--T--4--- + 

R8 I +46 



RS I +52 

R10 | +56 



Rll | +60 

__^_-4._-4-_ -4.— + -_ + _ — (.--+__4.__+-_4.— 4—4.__4— »4._-4— + 



AP(R12) | +64 



FP{R13) | +68 

-»-_4.-..4----+--+---4~-+--.+---+--+»-+--+--+— +--+—+— 4— + 



I PC- I +72 

I PSL | +7 6 
I P0BR 1+80 

+—+_-+—+— -.__+_»+—+—.+—+__+— 4— +__+— a.— .—4.- _+_-.+__+— 4—4— .-+--+—+—+—+—+— +—+ +— +— + 

|0 0 0 0 0| ASTLVL I 0 0 1 P0LR i +84 

I P1BR ! +86 

4— -4---+--+-- — -+--+— 4— -4---+--4.--,._-4--_ +--4— - + — 4— - + — +--+--+--4.— +-- +— 4.__+__+._4._-+_-+--+_-+__4 

100000000001 P1IR | +S2 

+—.+.— +—.+__4.—4—«+— 4.— +__+— + —4._-.+-.4— -4.—+-.-+—+—+—+—+—+—.+—+--+—+—+--+—+--+—+—+—+ 

31 30 29 28127 26 25 24|23 22 21 20|19 16 17 16115 14 13 12 | 11 10 09 08107 06 05 04 103 02 01 00 
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2.12 Mailbox Structure 

**For NVAX Plus LASERA COBRA) Bus systems CSRs exist on external I/O busses which are ac- 
cessed via mailbox structures that exist in main memory. Read requests are posted in mailboxes, 
and data is returned in memory with status in the following quadword. Mailboxes are allocated 
and managed by operating system software (successive operations must not overwrite data which 
is still in use). 

The I/O module will service mailbox requests via four mailbox pointer CSRs (LMBPR) located in 
the I/O modules nodespace. There is one LMBPR for each CPU node. The software sees only one 
LMBPR address, but the CPU module replaces the least significant two bits of the address (i.e. 
D<2:1>) with the least significant 2 bits of the node ID (i.e. NIOD<1:0>). If a given LMBPR is 
in use when it is written to, the I/O module will not acknowledge it, CNF will not be asserted. 
Processors use the lack of CNF assertion on writes to the LMBPR to indicate a busy status and 
the write is replayed at a later point in time under software control. 

The mailbox pointer CSR has the following format: 



Figure 2-40: LMBPR Register 



3 3 3 
1 2 1 


€' 5 


0 


I unused I 


MBX I 


MBZ I 




Table 2-17: 


LMBPR Description 




Name 


Bit(s) Type 


Description 


MBX 


26 WO 


This field contains the 64-byte-aligned physical address of the mail- 
box data structure in memory where the I/O module can find infor- 
mation to complete the required operation. 



The least significant 6 bits of the mailbox address are always 0, to force 64-byte_alignment. The 
upper six bits are unused in NVAX Plus systems since NVAX Plus only has a 32 bit wide physical 
address. The I/O module does however implement these bits. The NVAX Plus chip will always 
drive 0's on the upper data lines on I/O space writes such that these bits will be written with 0's. 

LMBPR points to a naturally aligned 64 byte data structure in memory that is constructed by 
software as follows: 
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Figure 2-41 : Mailbox Data Structure 







6 


6 55 5 


4 4 4 : 


;i 3 332 


2 2 


1 


1 








i 


0 96 7 


8 7 0 ! 


!* 2 10? 


A 3 


i 


5 


8 7 0 






















OK 


0 


1 


BUS 


IMB2 I 


MASK | 




CMD 






OK 


1 


1 






RBADR<63:0> 










OK 


2 


1 






WDATA<63:0> 










OK 


3 


1 






MB 2 










OK 


4 


1 






RDATA<63:0> 











I IE ID I 

OK 5 | STATUS |R|0| 

I IRINI 

— — — _______ ______ + _ + 

OK 6 I UNPREDICTABLE I 

+ — __________«______-______«__4 

OK 7 | UNPREDICTABLE I 



Table 2-18: Mailbox Data Structure Description 



Name 


Bit(s) 


Type 


Description 


CMD 


32 


RW 


This field contains the command. The I/O module supports read and 
write commands. 


MASK 


8 


RW 


This field contains the byte mask. The I/O module does not use this 
field. 


BUS 


24 


RW 


This field contains the BUS field, which is used to determine which 
remote bus this command is meant for. 


RBADR 


64 


RW 


This field contains the address to be broadcast on the remote bus. 


WD ATA 


64 


RW 


This field contains the write data to be broadcast on the remote bus. 


RDATA 


64 


RW 


This field contains read data returned from the remote bus. 


DON 


1 


RW 


This field contains a status bit which is Bet by the I/O module once 
a mailbox operation is complete. 


ERR 


i 


RW 


This field contains a status bit which indicates that a mailbox oper- 
ation failed. 



For a more complete description of the Laser system mailbox protocol refer to the I OP and LAMB 
module specifications. 
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2.12.1 Mailbox Operation 

To perform an I/O read or write on one the remote I/O busses software must create a maibox data 
structure in memory. The command, bus, and address fields must be filled in and the status bits 
must be cleared. For a write command the write data field must also filled in. At this point the 
physical address of the maibox data structure must be written to the LMBPR register to initiate 
the I/O operation. A simple I/O space write, such as with a MOVL, could be used to start the 
remote I/O operation. However, since writes to LMBPR may be rejected by the I/O module, and no 
state is preserved across a macro instruction boundry to notify software of this, another method 
must be used. Microcode implements an IPR register which can used to perform the LMBPR 
write and return status to software via the condition code bits. 

In order for microcode to perform the LMBPR it must know the address of the LMBPR register 
and the address of the mailbox data structure. Another memory data structure must be created 
to pass this information to microcode. This structure is called the Mailbox Pointer and consists 
of 2 longwords which begin at a quadword aligned address. 



Figure 2-42: Mailbox Pointer 



i LMBPR_ADDR | 

+————————-—*--——+ 

I MB ADDP. I MBZ I 



Table 2-19: Mailbox Pointer Description 



Name 



Bit(s) Type Description 



LMBPR.ADDR 32 WO This field contains the virtual address of the LMBPR register. 

M3ELADDR 32 WO This field contains the physical address of the mailbox data struc- 

ture. Since the mailbox data structure must be aligned on a 64 byte 
boundry, bits<5:0> of MB_ADDR must be zero. 



Once software creates the mailbox data structure and the mailbox pointer structure it may now 
start the I/O operation. An MTPR to the MAILBOX IPR will initiate the I/O operation. The 
MAILBOX IPR has the following format: 

Figure 2-43: MAILBOX Register 



1 0 



I MBXREG | 

+ _ _ „„_4 
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Table 2-20: MAILBOX Register Description 



Name 


Bit(s) 


Type 


Description 


MBXREG 


32 


WO 


This field contains the address of the mailbox pointer structure. 



Microcode will read the MB_ADDR field out of the mailbox pointer structure and then write this 
value to the LMBPR using the address of the LMBPR provided in the mailbox pointer structure. 

NOTE 

Note:Non QW aligned addresses for the LMBPR.ADDR results in Undefined Operation. 

An EDAL store conditional command is used to perform the write. Microcode will then check 
a status bit in the CBOX to determine if the write passed or failed. If the write passed, the 
PSL<Z> bit will be set, otherwise PSL<Z> will be cleared. Software can loop on the MTPR to the 
MAILBOX Register until the write passes. 

After the I/O module has accepted the write to LMBPR it will perform the I/O operation. Software 
can now poll the status bits in the mailbox data structure until the I/O operation is complete. 
One the I/O operation is complete the DON bit will be set, if an error occured te ERR bit will also 
be set. If this was an I/O write operation no further action is needed. If this was an I/O read 
operation, software can now fetch the returned data from the RDATA field in the mailbox data 
structure. 
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2.13 Processor Registers 

The processor registers that are implemented by the NVAX Plus CPU chip are logically divided 
into three groups, as follows: 

• Normal — Those IPRs that address individual registers in the NVAX CPU chip or system 
environment. 

• Pcache tag IPRs — The read-write block of IPRs that allow direct access to the Pcache tags. 

• Pcache data parity IPRs — The read-write block of IPRs that allow direct access to the Pcache 
data parity bits. 

Each group of IPRs is distinguished by a particular pattern of bits in the IPR address, as shown 
in Figure 2—44. 

Figure 2-44: IPR Address Space Decoding 

Normal IPR Address 

31 30 29 2812" 26 25 24123 22 21 20 119 18 17 16115 14 13 12111 10 09 0810" 06 05 04 103 02 01 00 

* *«._a.»_4.— +—+—4— J— 4— 4— -4— 4-- 4— 4— "r *~ 4— 4—+-- 4— r-- 4— t— + 4 

1 SBZ I 0! SBZ I IPR Number I 

Pcache Tag IPP. Address 

31 30 29 28 12"? 26 25 24123 22 21 20119 18 17 16115 14 13 12 111 10 09 08107 06 05 04 103 02 01 00 

4. j (—.(.— r + __4—._h 4.--+— — +— 4— +— +~*--4— — I— — H--4-- ..i— -4— •4—4— +— 4 

I SBZ | II 11 0| SBZ . I I Pcache Tag Index I SBZ I 

+__*__ + __*__ + __.^_ + __ + _-*__4.__ + -- + __^ 

I 

Pcache Set Select (0»left, 1-right) -4 
Pcache Data Parity I PP. Address 

31 30 29 28127 26 25 2412.3 22 21 20119 18 17 16115 14 13 12 111 10 09 08107 06 05 04 103 02 01 00 



I SBZ I II 11 II SBZ i I Pcache Tag Index I I SBZ i 

» j. »__ i .__4.__w. 4.__ + _-4._. + -_ + -_4.--4__^-_4.-_^_-*_-4.__ + --4.--4--4---»--*--4- 4—4— —4— 4- 4—4—4 

I I 

Pcache Set Select (0-left, 1-right) -4 Subblock select * 

The numeric range for each of the four groups is shown in Table 2—21. 
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Table 2-21: 


IPR Address Space Decoding 




IPR Group 


Mnemonic 2 


IPR Address Range 
(hex) 


Contents 


Normal 




00000000..000000FF 1 


256 individual IPRs. 


P cache Tag 


PCTAG 


01800000..01801FE0 1 


•256 Pcache tag IPRs, 128 for each Pcache set, 
each separated by 20(hex) from the previous 
one. 


Pcache Data Parity PCDAP 


01C00000..01C01FF8 1 


1024 Pcache data parity IPRs, 512 for each 
Pcache set, each separated by 8(hex) from the 
previous one. 



1 Unxised fields in the IPR addresses for these groups should be zero. Neither hardware nor microcode detects and faults on 
an address in which these bits are non-zero. Although non-contiguous address ranges are shown for these groups, the entire 
IPR address space maps into one of the these groups. If these fields are non-zero, the operation of the CPU is UNDEFINED. 

2 The mnemonic is for the first IPR in the block 



NOTE 

The address ranges shown above are those used by the programmer. When processing 
normal IPRs, the microcode shifts the IPR number left by 2 bits for use as an IPR com- 
mand address. This positions the IPR number to bits <9:2> and modifies the address 
range as seen by the hardware to 0..3FC, with bits <l:0>s=00. No shifting is performed 
for the other groups of IPR addresses. 

Because of the sparse addressing used for IPRs in groups other than the normal group, valid IPR 
addresses are not separated by one. Rather, valid IPR addresses are separated by either 8 or 
20(hex). For example, the IPR address for the first subblock of Pcache data parity is 01C00000 
(hex), and the IPR address for the second subblock of Pcache data parity is 01C00008 (hex). 

The NVAX Plus chip does not support the Bcache Tag or Bcache Deallocate IPRs. IPR addresses 
which do not correspond to chip IPRs are NOT converted to I/O space addresses, with IPR reads 
returning UNPREDICTABLE data, and IPR writes not completed. 

The processor registers implemented by the NVAX CPU are are shown in Table 2-22. 

NOTE 

Many of the processor registers listed in Table 2-22 are used internally by the mi- 
crocode during normal operation of the CPU, and are not intended to be referenced by 
software except during test or diagnosis of the system. These registers are flagged with 
the notation "Testability and diagnostic use only; not for software use in normal oper- 
ation*'. References by software to these registers during normal operation can cause 
UNDEFINED behavior of the CPU. 
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Table 2-22: Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


Cat 


Kernel Stack Pointer 


KSP 


0 


0 


RW 


1-1 


Ex^putivf Sfcarlt "Point^T 


ESP 


1 


1 


RW 


1-1 


Supervisor Stack Pointer 


SSP 


2 


2 


RW 


1-1 


User Stack Pointer 


USP 


3 


3 


RW 


1-1 


Interrupt Stack Pointer 


ISP 


4 


4 


RW 


1-1 






5 


5 










6 


6 






Reserved 




7 


7 






P0 Base Register 


POBR 


8 


8 


RW 


1-2 


P0 Length Register 


POLR 


9 


9 


RW 


1-2 


Pi Base Register 


P1BR 


10 


A 


RW 


1-2 


Pi Length Register 


P1LR 


11 


B 


RW 


1-2 


System Base Register 


SBR 


12 


c 


RW 


1-2 


SvRteTn LentrfcVi RecnsfceT 


SLR 


13 


X) 


RW 


1-2 


CPU Identification 1 


CPUID 


14 


E 


RW 


2-1 


Reserved 




15 


F 






Ptvipprr Grmfernl Rld^lr Baitf 


PCBB 


16 


10 


RW 


1-1 


SvRt^Tn Control "Block Barp 


SCBB 


17 


11 


RW 


1-1 


Interrupt Priority Level 1 


IPL 


18 


12 


RW 


1-1 


AST Level 1 


ASTLVL 


19 


13 


RW 


1-1 


SflftwflTP Int^TTTlTit RftOUPRt R^PIRtfiT 


SIRR 


20 


14 


W 


1-1 


Software TntftYnint. SummaTV Register 1 


SISR 


21 


15 


RW 


1-1 


ReRPTVftd 




99 


16 






TJ 1 

Keservea 




23 


17 






Interval Counter Control/Status 1,2 


ICCS 


24 


18 


RW 


1-3 


Next Interval Count 


N1CR 


25 


19 


W 


1-3 


Interval Count 


ICR 


26 


1A 


R 


1-3 


Time of Year Register 


TODR 


27 


IB 


RW 


1-3 


Reserved 




28 


1C 






Reserved 




29 


ID 






Reserved 




30 


IE 






Reserved 




31 


IF 






Reserved 




32 


20 







initialized on reset 

2 NVAX Plus implements the full Interval Tuner functionality on chip 
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Table 2-22 (Cont.): Processor Registers 

Number 

Register Name Mnemonic (Bee) (Hex) Type Cat 



Reserved 




33 


21 






Reserved 




34 


22 






Reserved 




35 


23 






Reserved 




36 


24 






Reserved 




37 


25 






Machine Check Error Register. 


MCESR 


38 


26 


W 


2-1 


Reserved 




39 


27 






Reserved 




40 


28 






Reserved 




41 


29 






Console Saved PC 


SAVPC 


42 


2A 


R 


2-1 


Console Saved PSL 


SAVPSL 


43 


2B 


R 


2-1 


Reserved 




44 


2C 






Reserved 




45 


2D 






Reserved 




46 


2E 






Reserved 




47 


2F 






Reserved 




48 


30 






Reserved 




49 


31 






Reserved 




50 


32 






Reserved 




51 


33 






Reserved 




52 


34 






Reserved 




53 


35 






Reserved 




54 


36 






Reserved 




55 


37 






Memory Management Enable 1 


MAPEN 


56 


38 


RW 


1-2 


Translation Buffer Invalidate All 


TBIA 


57 


39 


W 


1-1 


Translation Buffer Invalidate Single 


TBIS 


58 


3A 


W 


1-1 


Reserved 




59 


SB 






Reserved 




60 


3C 






Performance Monitor Enable 1 


PME 


61 


3D 


RW 


2-1 


System Identification 


SID 


62 


3E 


R 


1-1 


Translation Buffer Check 


TBCHK 


63 


3F 


W 


1-1 



1 Initialized on reset 
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Table 2-22 (Cont,): Processor Registers 

Number 



-r> ■ i XT 

.Register Name 


Mnemonic (Dec) 


(He 


Reserved 


64 


40 


Reserved 


65 


41 


Reserved 


66 


42 


Reserved 


67 


43 


Reserved 


68 


44 


Reserved 


69 


45 


Reserved 


70 


46 


Reserved 


71 


47 


Reserved 


72 


48 


Reserved 


73 


49. 


Reserved 


74 


4A 


Reserved 


75 


4B 


Reserved 


76 


4C 


Reserved 


77 


4D 


Reserved 


78 


4E 


Reserved 


79 


4F 


Reserved 


80 


50 


Reserved 


81 


51 


Reserved 


82 


52 


Reserved 


83 


53 


Reserved 


84 


54 


Reserved 


85 


55 


Reserved 


86 


56 


Reserved 


87 


57 


Reserved 


88 


58 


Reserved 


89 


59 


Reserved 


90 


5A 


Reserved 


91 


5B 


Reserved 


92 


5C 


Reserved 


93 


5D 


Reserved 


94 


5E 


Reserved 


95 


5F 
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Table 2-22 (Cont.): Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


Cat 


Reserved 




96 


60 






Reserved 




97 


61 






Reserved 




98 


62 






Reserved 




99 


63 






Reserved for VM 




100 


64 






Reserved for VM 




101 


65 






Reserved for VM 




102 


66 






Reserved 




103 


67 






Reserved 




104 


68 






Reserved 




105 


69 






Reserved 




106 


6A 






Reserved 




107 


6B 






Reserved 




108 


6C 






Reserved 




109 


6D 






Reserved 




110 


6E 






Reserved 




111 


6F 






Reserved 




112 


70 






Reserved 




llo 


71 






Reserved 




114 


72 






Reserved 




115 


73 






Reserved 




116 


74 






Reserved 




117 


75 






Reserved 




118 


76 






Reserved 




119 


77 






Reserved for Ebox 




120 


78 




2-4 


LASER MAILBOX 


LMBOX 


121 


79 


W 


2-1 


Interrupt System Status Register 8 


INTSYS 


122 


7A 


RW 


2-1 


Performance Monitoring Facility Count 


PMFCNT 


123 


7B 


RW 


2-1 


Patchable Control Store Control Register 5 


PCSCR 


124 


7C 


RW 


2-1 


Ebox Control Register 


ECR 


125 


7D 


RW 


2-1 


Mbox TB Tag Fill 8 


MTBTAG 


126 


7E 


W 


2-1 


Mbox TB PTE Fill 8 


MTBPTE 


127 


7F 


W 


2-1 



^Testability and diagnostic use only; not for software use in normal operation 
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Table 2-22 (Cont.): Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) Type 


Cat 


Reserved 


128 


80 


2-4 


Reserved 


129 


81 


O > 

2-4 


Reserved 


1 OA 

lou 


DO 

82 


2-4 


Reserved 


i on 

131 


83 


2-4 


Reserved 


132 


£> A 

84 


2-4 


Reserved 


133 


85 


ft A 

2-4 


Reserved 


1 O A 
1*4 


86 


2-4 


Reserved 


IOC 

135 


87 


2-4 


Reserved 


1 oe 
lob 


oo 
oo 


2-4 


Reserved 


1 0*7 

lo / 


tsy 


2-4 


Reserved 


IOC 

loo 


C A 


O A 

2-4 


Reserved 


t on 

139 


OTl 


2-4 


Reserved 


14U 




2-4 


Reserved 




ol/ 


2-4 


Reserved 


1 A O 

14/ 


Oil/ 


2-4 


Reserved . 


143 


or 


2-4 


Reserved 


144 


yu 


2-4 


Reserved 


1 / c 
140 


yi 


2-4 


Reserved 


14b 


yz 


2-4 


Reserved 


147 


oo 
93 


2-4 


Reserved 


1 AO 

14o 


94 


2-4 


Reserved 


14» 


yo 


2-4 


Reserved 


150 


96 


2-4 


Reserved 


151 


97 


2-4 


Reserved 


152 


98 


2-4 


Reserved 


153 


99 


2-4 


Reserved 


154 


9A 


2-4 


Reserved 


155 


9B 


2-4 


Reserved 


156 


9C 


2-4 


Reserved 


157 


9D 


2-4 


Reserved 


158 


9E 


2-4 


Reserved 


159 


9F 


2-4 
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Table 2-22 (Cont.): Processor Registers 

Number 

Register Name Mnemonic (Dec) (Hex) Type Cat 



BIU Control Register 


BIU.CTL 


160 


AO 


W 


2-3 


Diagnostic Control Register 


DIAG.CTL 


161 


Al 


w 


2-3 


Bcache Error Tag 


BC_TAG 


162 


A2 


R 


2-3 


Reserved for Cbox 




163 


A3 




2-4 


BIU Status 


BIU.STAT 


164 


A4 


W1C 


2-3 


Reserved for Cbox 




165 


A5 




2-4 


BKJ Address 


BKLADDR 


166 


A6 


R 


2-3 


Reserved for Cbox 




167 


A7 




2-4 


Fill Syndrome 


FILL.SYN 


168 


A8 


R 


2-3 


Reserved for Cbox 




169 


AS 




2-4 


Fill Address 


FILL_ADDR170 


AA 


R 


2-3 


Reserved for Cbox 




A / J. 


AB 




2-4 


STxC Pass Fail/CEFSTS 


IPR_STR_ 
COND 


172 


AC 


RW 


2-3 


Reserved for Cbox 




173 


AD 




2-4 


Software ECC 


BCDECC 


174 


AE 


W 


2-3 


Reserved for Cbox 




175 


AF 




2-4 


CONSOLE REG 


CHALT 


176 


B0 


RW 


2-3 


Reserved for Cbox 




177 


Bl 




2-4 


Serial I/O 


SIO 


178 


B2 


RW 


2-3 


Reserved for Cbox 




179 


B3 




2-4 


SROM„oe/SROM_fast 


SOE-IE 


180 


B4 


RW 


2-3 


Reserved for Cbox 




181 


B5 




2-4 


Reserved for Cbox 




182 


"Rfi 
DO 






Reserved for Cbox 




183 


B7 




2-4 


Pack 10 to QW 


QW.PACK 


184 


B8 


W 


2-3 


Clear QW 10 Pack 


CLR_I0_ 
PACK 


185 


B9 


W 


2-3 


Reserved for Cbox 




186 


BA 




2-4 


Reserved for Cbox 




187 


BB 




2-4 


Reserved for Cbox 




188 


BC 




2-4 


Reserved for Cbox 




189 


BD 




2-4 


Reserved for Cbox 




190 


BE 




2-4 


Reserved for Cbox 




191 


BF 




2-4 
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Table 2-22 (Cont.): Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


Cat 


Reserved 




i no 
192 


CO 






Reserved 




19o 


CI 






Reserved 




in/ 
194 


C2 






Reserved 




195 


C3 






Reserved 




19b 


C4 






Reserved 




197 


C5 






Reserved 




1 Qfi 
19o 


C6 






Reserved 




1 QO 

199 


Wl 






Reserved 




2UU 


pe 






Reserved 




2U1 


\Ja 




• 


Reserved 




2U2 


f\ A 

UA 






Reserved 




2Uo 








Reserved 






OO 






Reserved 




one 
2UD 


UJL) 






Reserved 




OAC 


OIL 






Reserved 




2U I 


pn 






VIC Memory Address Register 


vXVLAJV 


one 
2Uo 


DO 


RW 


2-3 


VIC Tag Register 


V lAVjr 




Dl 


T>TT7 

KW 


o o 
2-3 


VIC Data Register 


TTTv A rp* 

VJJALLA 


210 


D2 


RW 


2-3 


Ibox Control and Status Register 


TOOT5 


Oil 

211 


D3 


RW 


2-3 


Ibox Branch Prediction Control Register 8 


n Dm? 


oi o 
212 


D4 


T>TTT 
RW 


2-3 


Reserved for Ibox 




213 


Do 




2-4 


Ibox Backup PC 4 


BPC 


214 


D6 


R 


2-3 


Ibox Backup PC with RLOG Unwind 4 


BPCUNW 


215 


D7 


R 


2-3 


Reserved for Ibox 




216 


D8 




2-4 


Reserved for Ibox 




217 


D9 




2-4 


Reserved for Ibox 




218 


DA 




2-4 


Reserved for Ibox 




219 


DB 




2-4 


Reserved for Ibox 




220 


DC 




2-4 


Reserved for Ibox 




221 


DD 




2-4 


Reserved for Ibox 




222 


DE 




2-4 


Reserved for Ibox 




223 


DF 




2-4 



Testability and diagnostic use only; not for software use in normal operation 
Chip test use only; not for software use 
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Tabie 2-22 (Cont.): Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


Cat 


Mbox P0 Base Register 3 


MPOBR 


224 


E0 


RW 


2-3 


Mbox P0 Length Register 8 


MPOLR 


225 


El 


RW 


2-3 


Mbox Pi Base Register 8 


MP1BR 


226 


E2 


RW 


2-3 


Mbox PI Length Register 3 


MP1LR 


227 


E3 


RW 


2-3 


Mbox System Base Register 3 


MSBR 


228 


E4 


RW 


2-3 


Mbox System Length Register 3 


MSLR 


229 


E5 


RW 


2-3 


Mbox Memory Management Enable 3 


MMAPEN 


230 


E6 


RW 


2-3 


Mbox Physical Address Mode 


PAMODE 


231 


E7 


RW 


2-3 


Mbox MME Address 


MMEADR 


232 


E8 


R 


2-3 


Mbox MME PTE Address 


MMEPTE 


233 


E9 


R 


2-3 


Mbox MME Status 


MMESTS 


234 


EA 


R 


2-3 


Reserved for Mbox 




235 


EB 




2-4 


Mbox TB Parity Address 


TBADR 


236 


EC 


R 


2-3 


Mbox TB Parity Status 


TBSTS 


237 


ED 


RW 


2-3 


Reserved for Mbox 




238 


EE 




2-4 


Reserved for Mbox 




239 


EF 




2-4 


Reserved for Mbox 




240 


F0 




2-4 


Reserved for Mbox 




241 


Fl 




2-4 


Mbox Pcache Parity Address 


PCADR 


242 


F2 


R 


2-3 


Reserved for Mbox 




243 


F3 




2-4 


Mbox Pcache Status 


POSTS 


244 


F4 


RW 


2-3 


Reserved for Mbox 




245 


F5 




2-4 


Reserved for Mbox 




246 


F6 




2-4 


Reserved for Mbox 




247 


F7 




2-4 


Mbox Pcache Control 


PCCTL 


248 


F8 


RW 


2-3 


Reserved for Mbox 




249 


F9 




2-4 


Reserved for Mbox 




250 


FA' 




2-4 


Reserved for Mbox 




251 


FB 




2-4 


Reserved for Mbox 




252 


FC 




2-4 


Reserved for Mbox 




253 


FD 




2-4 


Reserved for Mbox 




254 


FE 




2-4 


Reserved for Mbox 




255 


FF 




2-4 



3 Testability and diagnostic use only; not for software use in normal operation 
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Table 2-22 (Cont.): Processor Registers 



Number 



Register Name 



Mnemonic (Dec) (Hex) Type 



Cat 



Urrimplemented 



100- 



017FFFFF 



See Table 2-21 



01800000- 



2 



PFFFFFFF 



lype: 

R = Read-only register 
RW = Read-write register 
W = Write-only register 
W1C = Write 1 Clear 

Cat(egory), class-subclass, where: 
class is one of: 

1 = Implemented as per DEC standard 032 

2 = NVAX Plus specific implementation which is unique or different from the DEC standard 032 implementation 
subclass is one of: 

1 = Processed as appropriate by Ebox microcode 

2 = Converted to Mbox IPR number and processed via internal IPR command 

3 = Processed by internal IPR command 

4 = May be block decoded; reference causes UNDEFINED behavior 
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2.14 Revision History 



Table 2-23: Revision History 



Who 


When 


Description of change 


Mike Uhler 


06-Mar-1989 


Release for external review. 


Mike Uhler 


15-Dec-1989 


Update for second-pass release. 


Mike Uhler 


20-Jul-1990 


Update to reflect implementation. 


Mike Callander/Gil 


15-Nov-1990 


NVAX Plus release for external review. 


Wolrich 






Gil Wolrich 


15-MAR-1991 


Reverse mailbox pointer operands, add clr_io_pack ipr. 
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Chapter 3 
External Interface 

3.1 Overview 

NVAX Plus can share system platforms which use EV chips in 128 bit mode. The CPU_CLK 
runs at a cycle time as fast as 10ns, and SYS_CLK can be set to 2,3,or 4, times the CPU cycle 
time. NVAX Plus usable in a wide range of systems: workstations, small deskside servers and 
timesharing machines, and midrange multiprocessor servers and timesharing machines. 



3.2 Signals 

The following table lists all of the 291 signals on the NVAX_PLUS chip. In the "type" column, an 
"I" means a pin is an input, an "0" means the pin is an output, a "T" means the pin is a tristate 
output, and a "B" means the pin is tristate and bidirectional. 



Table 3-1 : NVAX_PLUS Signals 


Signal Name 


Count 


Type 


Function 


clkln_h, _1 


2 


I 


Clock input 


testClklnJi, J 


2 


I 


Clock input for testing 


cpuClkOut_h 


1 


0 


CPU clock output 


sysClkOutl.h, J 


2 


0 


System clock output, delayed 


sysClkOut2_h, J 


2 


0 


System clock output, delayed 


icMode_h[l] 


1 


I 


Enables pp_cmd_b<2:0> for test mode 


clk_rst_h 


1 


I 


Put cpu and sys_clk timing gen. to known state 


pp_data„h[ll] 


1 


B 


Parallel Test Port Data, MAB clock 


pp_data„h[7..6] 


2 


B 


Parallel port [7:6] if enabled, EVtagAdr_h[33..32] 


pp_data„h[5..0] 


6 


B 


Dedicated Parallel Test Port Data 


oscl6m_h 


1 


I 


Interval timer I6MH2 oscillator input 


dcOkJi 


1 


I 


Power and clocks ok 
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Table 3-1 (Cont.): NVAX_PLUS Signals 



Signal Name 



Count 



Type 



Function 



resetj 

sRomOEJ 

sRomD_h 

sRomClk_h 

icMode[0]/pp_cmd[2] 

adr_h[33..32] 

adr_h[31..17] 

adr_h[16..5] 

tagEq_l 

data_n[127..0] 

check_h[27..0] 

dOEJ 

pp_cmd[l:0] 

dRAck_h[2] 

dRAck_h[l] 

dRAck_h[0] 

tagCEOE.h 

tagCtlWE.h 

tagCtlV_h 

tagCtlSJi 

tagCtlD.h 

tagCtlP.h 

tagAdr_h[31..20] 

tagAdr_h[19] 

tagAdr_h[18] 

tagAdr_h[17] 

tagAdrP.h 

tagOkJi, J 

dataCEOE_h[3..0] 

dataWE_h[3..0] 

dataA_h[4] 

dataA_h[3] 

holdReq_h 

holdAckh 



1 
1 
1 
1 
1 

2 
15 
12 

1 

128 
28 
1 
2 

1 
1 
1 
1 
1 
1 
1 
1 
1 
12 
1 
1 
1 
1 
2 
4 
4 
1 
1 
1 
1 



I Reset 

0 Serial ROM output enable 

1 Serial ROM data/Rx data 

0 Serial ROM clock/Tx date 

1 Serial ROM fast fill, sRomFast_h/uBed as pp_ 
cmd[2] in test mode 

T Address bus 33,32 

B Address bus tag section 

T Address bus index section 

O * Tag compare output 

B Data bus 

B Check bit bus 

Data bus output enable 

EV dWSel_h[L-0] used to select port function in 
test mode 

bus read acknowledge, load data 
dRAck„h[l] cache/no_cache 
bus read acknowledge, check ecc/parity 
O tagCtl and tagAdr CE/OE 

0 tagCtl WE 
B Tag valid 
B Tag shared 
B Tag dirty 

B Tag V/S/D parity 

1 Tag address [31..20] 

B Tag address [19], Parallel Port [10] if enabled 

B Tag address [18], Parallel Port[9] if enabled 

B Tag address [17], Parallel Port[8] if enabled 

I Tag address parity 

I Tag access from CPU is ok 

0 data CE/OE, longword 

O data WE, longword 

O data A[4] 

0 data A[3] 

1 Hold request 

O Hold acknowledge 
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Table 3-1 (Cont.): 


NVAX_PLUS Signals 






Signal Name 


Count 


Type 


Function 


cReq_h[.2..0J 


3 


o 


Cycle request 


_ttt> ^ _ _i. -l riy r\-\ 

cWMasK_nL7..0j 


8 


V 


Cycle write mask 


cAck_n.[.z..UJ 


O 




Cycle acknowledge 


iAar_hL12..oj 


8 




Invalidate address 


pinvReq_nL1..0j 


2 




Invalidate request, Pcache 


pMapWJi,_nL1..0J 


ft 

2 


o 


Backmap WE, Pcache 


err_Mrq_h[5] 


1 




External error interrupt 


halt_Mrq„h[4] 


1 




Halt interrupt 


irq_h[3..0] 


4 




Interrupt requests 


vref 


1 




Input reference/not used by NVAX Plus 


tristate_l 


1 




Tristate for testing 


cont_l 


1 




Continuity for testing 


test_mode_h 


. 1 




Enables pull-downs on check_h bits, was eclOut_ 



h 



The following table lists all of the signals that were not on EVAX which are being implemented 
on the NVAX_PLUS chip. In the "type" column, an "I" means a pin is an input, an "0" means 
the pin is an output, and a "B" means the pin is tristate and bidirectional. 



Table 3-2: New_NVAX_PLUS Signals 


Signal Name 


Count 


Type 


Function 


test_mode_h 


1 


I 


Enables check_h pull downs 


oscl6m_h 


1 


I 


Interval timer 16MHz oscillator input 


pp_data_h[6..0] 


7 


B 


Parallel Test Port Data 


pInvReq_h[1..0] 


2 


I 


Invalidate request, Pcache 


pMapWEJi[1..0] 


2 


0 


Backmap WE, Pcache 



The following table lists all of the signals that were on EVAX which are not being implemented 
on the NVAX.PLUS chip. In the "type" column, an "I" means a pin is an input, an "O" means 
the pin is an output, and a "B" means the pin is tristate and bidirectional. 



Table 3-3: EVAX Signals 

Signal Name Count Type Function 

dInvReq_h 1 I Invalidate request, Dcache 

dMapWE.h 1 O Backmap WE, Dcache 

perf_hf3..0] 4 O Performance monitor outputs 



DIGITAL CONFIDENTIAL 



External Interface 3—3 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 



Table 3-3 (Cont.): EVAX Signals 



Signal Name 


Count 


Type 


Function 


scanji[3..0] 


4 


? 


Scan 



3.2.1 Clocks 

External logic supplies NVAX Plus with a differential clock at the desired frequency of the internal 
phases via the clkln_h and clkln_l pins. The NVAX Plus Clock Generator circuit produces the 
required four single phase clocks, four inverted single phase clocks, and four dual phases clocks 
required for internal operation. 

NVAX Plus divides the input clock by **two** to generate the cpuClkOut_h. The false-to-true 
transition of cpuClkOut_h is the "CPU clock" used in the timing specification for the tagOk_l 
signal. 

The CPU clock is divided by a programmable value of 4,6,or 8 (2,3 or 4 cpu cycles) to generate a 
system clock, which is supplied to the external interface via the sysClkOutlJn and sysClkOutl_l 
pins. The system clock is delayed by a programmable number of CPU clocks between 0 and 3 to 
generate a delayed system clock, which is supplied to the external interface via the sysClkOut2_h 
and sysClkOut2_l pins. 

The clock generator runs, generating cpuClkOut_h, and the (correctly timed and positioned) any 
time an input clock is supplied. In particular, it runs during reset, so that systems can phase-lock 
the clocks of several chips together before any of them are released from reset. 

**The sysClkOut value of 6 times the cpuClkOut, results in an asymmetric clock, asserted for 4 
cpuClkOut periods, then deasserted for 2 cpuClkOut periods.** 

The false-to-true transition of sysClkOutl_h is the "system clock" used as a timing reference 
throughout this specification. 

Almost all transactions on the external interface run synchronously to the CPU clock and phase 
aligned to the system clock, so the external interface appears to be running synchronously to the 
system clock (most setup and hold times are referenced to the system clock). The exceptions to 
this are the fast, NVAX Plus controlled tranactions on the external caches and the sample of the 
tagOkJ input, which are synchronous to the CPU clock, but independent of the system clock. 

3.2.2 DC_OK and Reset 

NVAX Plus contains a ring oscillator which is switched into service during power up to provide an 
internal chip clock. The dcOk_h signal switches clock sources between the on-chip ring oscillator 
and the external clock oscillator. If dcOk_h is false then the on-chip ring oscillator feeds the 
clock generator, and NVAX Plus is held in reset, independent of the state of the reset.l signal. If 
dcOk_h is true then the external clock oscillator feeds the clock generator, (NVAX Plus does not 
use the vRef input) and NVAX Plus is held in reset by resetJL 

Note if the dcOkJri signal is generated by an RC delay, there is no check that the input clocks 
are really running. This means that if a board is powered up in manufacturing with a missing, 
defective, or mis-soldered clock oscillator then NVAX Plus will enter a possibly destructive high- 
current state. Furthermore, if a clock oscillator fails in stage 1 burn-in then NVAX Plus may also 
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enter this state. The frequency and duration of such events need to be understood by the module 
designer to decide if this is really a problem. 

The reset_l signal forces the CPU into a known state.The reset_l signal is asynchronous, and 
must be asserted for at least tbd CPU cycles after the assertion of dcOk_h to guarantee that the 
CPU is reset. This should always be the case, since it also has to be held true for long enough to 
guarantee that the serial ROM has reset its address counters (which takes about 100ns). 

The NVAX Plus CPU chip uses a 3.3V power supply. This 3.3V supply must be stable before any 
input goes above 4V. 

While it is reset, NVAX Plus reads sysClkOut and external bus configuration information off the 
irqjti pins. External logic should drive the configuration information onto the irq_h pins any time 
resetji is true. 

NOTE 

NOTE: The irq_h pins are latched with the deasserting edge of reset_l. 

The irq_h'[2..1] bits" encode the value of the divisor used to generate the system clock from the 
CPU clock. 

Ta ble 3-4: System Clock Divisor 

irq_h[2] irq_h[l] Ratio 

F F 2 

F T 2 

T F 3 asymmetric 

T T 4 

The irq_h[4..3] bits encode the delay, in CPU clock cycles, from the "system clock" to sysClkOut2. 

Table 3-5: System Clock Delay 

irq_h[4] irqja[3] Delay 

F F 0 

F T 1 

T F 2 

T T 3 



3.2.3 initialization and Diagnostic Interface 

After the resetji signal is deasserted, but before NVAX Plus executes its first instruction, the 
Pcache is written with bits out of a serial ROM (such as an AMD Aml736). The serial ROM 
contains enough VAX code to complete the configuration of the external interface, e.g. setting the 
timing on the external cache RAMs and diagnose the path between the CPU chip and the real 
ROM. 
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Three signals are used to interface to the serial ROM. The sRomOEJ output signal supplies the 
output enable to the ROM, serving both as an output enable and as a reset (refer to the serial 
ROM specifications for details). The sRomClk_h output signal supplies the clock to the ROM that 
causes it to advance to the next bit. The ROM data is read by NVAX Plus via the sRomD_h input 
signal. The format of the bits in the serial ROM is tbd , however driving sRomD_h false clears 
the Pcache. 

Once the data in the serial ROM has been loaded into the Pcache, sRomD_h can be used for a 
serial input line, and sRomClk_h can be used as a serial output line. 

It is possible to override the loading of the entire Pcache by driving the icModeJti<0> signal true 
when reset is asserted. If icMode_h<0> (sRomFast) is asserted the SROM is not copied to Pcache 
and the first instruction is fetched from address E0040000(16), the console start address. This 
feature is also used for test purposes to minimize chip tester time. 

3.2.4 Address Bus 

The tristate, bidirectional adrji pins provide a path for addresses to flow between NVAX Plus 
and the rest of the system. The adr_h pins are connected to the buffers that drive the address 
pins of the external cache RAMs, and to the transceivers that are located between CPU local 
address bus and the CPU module address bus. 

The address bus is normally driven by NVAX Plus. NVAX Plus stops driving the address bus 
during reset and during external cache hold. In these states the address bus acts like an input, 
and the tagEq_l output is the result of an equality compare between adr_h and tagAdr_h. Only 
bits that are part of the cache tag, as specified by the BC_SIZE field of the BIU.CTL IPR, 
participate in the compare. 

**The NVAX Plus tagEq_l determination does not include tagAdr parity.** 

3.2.5 Data Bus 

The tristate, bidirectional data„h pins provide a path for data to flow between NVAX Plus and 
the rest of the system. The data_h pins connect directly to the I/O pins of the external cache data 
RAMs and to the transceivers that are located between NVAX Plus local data bus and the CPU 
module data bus. 

The tristate, bidirectional check_h pins provide a path for check bits to flow between the CPU 
and the rest of the system. The check_h pins connect directly to the I/O pins of the external 
cache data RAMs and to the transceivers that are located between the CPU local check bus and 
the CPU module check bus. In "PV" mode the check_h pins do not drive when the data_h pins 
are driving write data, allowing the PV byte parity generation logic to drive the check_h lines for 
byte parity. The check.h lines not used for parity are contain receivers and should be pulled up. 
The check_h are not connected at wafer probe due to contraints in the number of signal which 
can be probed. If the test„mode_h pin is asserted internal pullups for check[27..0] are enabled. 

The data bus is driven by NVAX Plus when it is running a fast write cycle on the external caches, 
and when some type of write cycle has been presented to the external interface and external logic 
has enabled the data bus drivers (via dOE_l). 
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If NVAX Plus is in ECC mode then the check_h pins carry 7 check bits for each longword on 
the data bus. Bits check_h[6..0] are the check bits for data_h[31..0]. Bits check_,h[13..7] are the 
check bits for dataji[63..32]. Bits check Ji[20.. 14] are the check bits for data„h[95..64]. Bits 
check_h[17..21] are the check bits for data Ji[ 127.. 96]. 

The following ECC code is used. This code is the same one used by the IDT49C460 and 
AMD29C660 32-bit ECC generator/checker chips. 



dddddddddddddddddddddddddddddddd 

33222222222211111111110000000000 

10987 €54321 0967 654321 0987 6543210 
c6 XOR xxxxxxxx xxxxxxxx 
c5 XOR xxxxxxxx xxxxxxxx 
c4 XOR xx xxxxxx xx xxxxxx 

c. 3 XNOR xxx xxx xx xxx xxx xx 
c2 XNOR xx xx x xx xx x xx x xx x 

cl XOR X X X X X XXX X X X X X xxx 

CO XOR X XX X X XXX X X XXXX X X 

By arranging the data and check bits correctly, it is possible to arrange that any number of errors 
restricted to a 4-bit group can be detected. One such arrangement is as follows: 

d[00], d [01] , d[03], d[25] 
d[02], d[04], d[06], c[06] 
d[05], d[07], d[12], c[03] 
d[08], d[06], dm], <S[14] 
d[10j, d[13], d[15], d[19] 

d. [16j r d[17], d[22], d[28] 
d[18], d[23] r d[30], c[05] 
d|20], d[27} ( c[04], c[00] 
d[21], d[26], c[02], c[01] 
d[24], d [29] , d[31] 

If NVAX Plus is in PARITY mode then 4 of the checkjh pins carry EVEN parity for each longword 
on the data bus, and the rest of the bits are unused. Bit check_h[0] is the parity bit for data_ 
h[31..0]. Bit checkji[7] is the parity bit for data_h[63..32]. Bit checkji[14] is the parity bit for 
data_h[95..64]. Bit check_h[21] is the parity bit for datajb[127..96]. 

If NVAX Plus is in "PV mode then check_h[3..0] are the byte parity bits for data„h[31..0], check_ 
h[10..7J are the byte parity bits for data_h[63..32], check_h[17..14] are the byte parity bits for 
dataji[95..64], check Ji[24.. 21] are the byte parity bits for data_h[127..96]. The four byte parity 
bits for each longword are 'xored' to produce a single longword parity bit. 

The ECC bit in the BIU.CTL IPR determines if NVAX Plus is in ECC mode or in, PARITY mode. 

3.2.6 External Cache Control 

The external cache is a direct-mapped, write-back cache. NVAX Plus always views the external 
cache as having a tag for each 32-byte block (the same as the NVAX Plus Pcache). 

The external cache tag RAMs are located between NVAX Plus' local address bus and NVAX Plus' 
tag inputs. The external cache data RAMs are located between the CPU's local address bus and 
the CPU's local data bus. NVAX Plus reads the external cache tag RAMs to determine if it can 
complete a cycle without any module level action, and NVAX Plus reads or writes the external 
cache data RAMs if, in fact, this is the case. 
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A cycle requires no module level action if it is a non-LDxL read hit to a valid block, or a non-STxC 
write hit to a valid but not shared block when not in "PV" mode. All other cycles require module 
level action. All cycles require module level action if the external cache is disabled (the BC_EN 
bit in the BIU_CTL IPR is cleared). 

All NVAX Plus controlled cycles on the external cache have fixed timing, described in terms of 
NVAX Plus's internal clock. The actual timing of the cycle is programmable allowing for flexibility 
in the choice of CPU clock frequencies and cache RAM speeds. 

The external cache RAMs can be partitioned into three sections; the tagAdr RAM, the tagCtl RAM, 
and the data RAM. Sections do not straddle physical RAM chips in non "PV" mode systems. 

NOTE 

For "PV" mode systems since NVAX Plus only reads from the tagAdr RAM and tagCtl 
RAM these sections can be implemented in the same RAM chips. 

3.2.6.1 The TagAdr RAM 

The tagAdr RAM contains the high order address bits associated with the external cache block, 
along with a parity bit. The contents of the tagAdr RAM is fed to the on-chip address comparator 
and parity checker via the tagAdrJi and tagAdrP_h inputs. 

NVAX Plus verifies that tagAdrP_h is an EVEN parity bit over tagAdr_h when it reads the tagAdr 
RAM. NVAX Plus asserts c%cbox_hard_error if the parity is wrong and stops the reference. 

The number of bits of tagAdr_h that participate in the address compare and the parity check is 
controlled by the BC.SIZE field in the BIU_CTL IPR. The tagAdr_h signals go all the way down 
to address bit 17, allowing for a 128Kbyte cache built out of RAMs that are 8K deep. 

The chip enable or output enable for the tagAdr RAM is normally driven by a two input NOR gate 
(such as the 74AS805B). One input of the two input NOR gateis driven by tagCEOEJi, and the 
other input is driven by external logic. NVAX Plus drives tagCEOE_h false during reset, during 
external cache hold, and during any external cycle. The OE bit in the BIU_CTL IPR determines 
if tagCEOE_h has chip enable timing or output enable timing. 

3.2.6.2 The TagCtl RAM 

The tagCtl RAM contains control bits associated with the external cache block, along with a 
parity bit. NVAX Plus reads the tagCtl RAM via the three tagCtl signals to determine the state 
of the block. NVAX Plus writes the tagCtl RAM- via the three tagCtl signals to make blocks dirty. 

NVAX Plus verifies that tagCtlP Ji is an EVEN parity bit over tagCtlVJi, tagCtlS Ji, and tagCtlD. 
h when it reads the tagCtl RAM. NVAX Plus asserts c%cbox_hard_err if the parity is wrong and 
stops the reference. NVAX Plus computes EVEN parity across the tagCtlVJi, tagCtlS_h, and 
tagCtlD_h bits, and drives the result onto the tagCtlP_h pin, when it writes the tagCtl RAM. 

The following combinations of the tagCtl RAM bits are allowed. Note that the bias toward 
conditional write- through coherence is really only in name; the tagCtlS_h bit can be viewed 
simply as a write protect bit. 
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Table 3—6: 


tag v^oniroi cncouinys 








tagCtlS_h 


tagCtlDJb 


Meaning 


F 






Invalid 


T 


F 


F 


Valid, private 


T 


F 


T 


Valid, private, dirty 


T 


. T 


F 


Valid, shared 


T 


T 


T 


Valid, shared, dirty 



NVAX Plus can satisfy a read probe if the tagCtl bits indicate the entry is valid (tagCtlVJi s T). 
NVAX Plus can satisfy a writes probe if the tagCtl bits indicate the entry is valid and not shared 
(tagCtlVh = T, tagCtlS.h = F). 

The chip enable or output enable for the tagCtl RAM is normally driven by a two input NOR gate 
(such as the 74AS805B). One input of the two input NOR gate is driven by tagCEOEJh, and the 
other input is driven by ^external logic. NVAX Plus drives tagCEOE_h false during reset, during 
external cache hold, and during any external cycle. The OE bit in the BIU_CTL IPR determines 
if tagCEOE_h has chip enable timing or output enable timing. 

The write enable for the tagCtl RAM is normally driven by a two input NOR gate (such as the 
74AS805B). One input of the two input NOR gate is driven by tagCtl WE_h, and the other input 
is driven by external logic. NVAX Plus drives tagCtlWE_h false during reset, during external 
cache hold, and during any external cycle. 

3.2.6.3 The Data RAM 

The data RAM contains the actual cache data, along with any ECC or parity bits. 

The most significant bits of the data RAM address are driven, via buffers, from the address bus. 
The least significant bit of the data RAM address is driven by a two input NOR gate (such as 
the 74AS805B). One of the inputs of the two input NOR gate is driven by dataA_h[4], and the 
other input is driven by external logic. NVAX Plus drives dataA_h[4J false during reset, during 
external cache hold, and during any external cycle. 

The chip enables or output enables for the data RAM are driven by a two input NOR gate (such 
as the 74AS805B). One input of the two input NOR gate is driven by dataCEOE_h[3..0], and 
the other input is driven by external logic. NVAX Plus drives dataCEOE_h[3..0] false during 
reset, during external cache hold, and during external cycles. (NVAX Plus sometimes drives 
dataCEOE_h[3..0] true during external write cycles, to simplify merging old cache data with new 
write data). The OE bit in the BIU.CTL IPR determines if dataCEOEJi[3..0] has chip enable 
timing or output enable timing. 

The write enables for the data RAM are normally driven by a two input NOR gate (such as the 
74AS805B). One input of the two input NOR gate is driven by dataWE_h[3..0], and the other 
input is driven by external logic. NVAX Plus drives dataWE_h[3..0] false during reset, during 
external cache hold, and during any external cycle. 
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3.2.6.4 Backmaps 

Some systems may wish to maintain backmaps of the contents of the Pcache to improve the 
quality of their invalidate filtering. NVAX Plus must maintain the backmaps for external cache 
read hits, since external cache read hits are controlled totally by NVAX Plus. External logic 
maintains the backmaps for external cycles (read misses, invalidates, and so on). 

The backmaps are only consulted by external logic, so that their format, or, for that matter, their 
existence, is of no concern to NVAX Plus. All NVAX Plus does is generate backmap write pulses 
at the right time. Simple systems will not bother to maintain backmaps, will not connect the 
backmap write pulses to anything, and will generate extra invalidates. 

The NVAX Plus Pcache is 8kB and can be configured as either a single set of 256 indexes, or two 
sets of 128 indexes each. If NVAX Plus is allocating Pcache as two way set associative NVAX 
Plus drives pMap"WE_h[0] or pMap WE_h[l] depending on the Pcache set which is to be allocated 
whenever it fills the Pcache from the external cache, and systems must assert the corresponding 
pInvReq_h[l:0] to invalidate an entry in Pcache. 

If NVAX Plus is allocating Pcache as direct mapped pMapWE_h[0] is driven and systems assert 
pInvReq_h[0] to invalidate an entry in Pcache. 

The pMapWE_h[1..0] signals assert two cpuClkOut cycles into the second (ast) data read cycle 
and negate at the end of that cycle. 

3.2.6.5 External Cache Access 

The external caches are normally controlled by NVAX Plus. Two methods exist for gaining access 
to the external cache RAMs. 

3.2.6.5.1 HoldReq and HoidAck 

The simple method for external logic to access the external caches is to assert the holdReq_h 
signal. 

A holdReq_h/holdAck_h sequence can happen at any time, even in the middle of an external cycle. 
All of the acknowledge-like signals (dOE_l, dRAck_h, cAck.h) work normally. The system logic 
can use this functionality to maintain cache coherency operations while a system read/write is in 
progress. 

If the NVAX Plus ARB sequencer is IDLE' and a HoldReq is received, the HoidAck signal is 
asserted, with the next rising edge of SysClkOut. NVAX Plus discontinues cache cycles if the 
HolReq signal is recognized before the tag compare is completed. The NVAX Plus ARB sequencer 
enters a 'stall' state in which HoidAck is asserted. If a read or write sequence is in progress 
and has advanced beyond the tag compare cycle, the operation is completed. For read hits the 
second octaword of data is read and the hold is acknowlegded as the block is being filled to 
the Pcache. For read misses the CREQ of read_block or LD_LK is driven to the system. The 
hold is then acknowledged, allowing the system to access the Bcache. For write hits the write 
completes and the hold is acknowledged in the next ARB cycle, which is an IDLE' before the next 
operation can be dispatched. For write misses (or writes which do not probe Bcache), the CREQ 
of write_block or STxC is driven to the system. As for system reads, the hold is acknowledged 
allowing the system access to the Bcache before completing the NVAX Plus write operation. When 
HoidAck is asserted, NVAX Plus tri-states adr.h, tagCtlV_h, tagCtlS Ji, tagCtlDJi, and tagCtlP_ 
h, drives tagCEOE.h, tagCtlWE_h, dataCEOEJi, dataWEJi, and dataAJi false, (the cReq_h 
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and cWMask_h signals are not modified in any way). Note data_h (and check_h if not "PV") are 
driven only if dOE_l is assertes during a write_block or STxC cycle; dOE_l needs to be deasserted 
to tristate data_h(/check_h) during system write operations. When external logic is finished with 
the external caches it negates holdReq_h. NVAX Plus detects the negation of holdReq_h, negates 
holdAck_h, and re-enables its outputs. If the hold is acknowledged after a CREQ has been issued 
the system must then complete the operation and respond with the appropriate cAck. When 
HoldReqJh is received the address bus begins driving in 1 1/2 cpu cycles at internal phase 3 
prior to the deassertion of HoldAck_h, and dataCEOE_h<3:0> and tagCEOE_h reassert at phase 
2 after the next drive_nrst cpu cycle (2 1/4 cpu cycles for drv_clk = 2 cpu cycles, and sys_clk = 2 
cpu cycles ) if the hold sequence occurred during an idle NVAX Plus cycle. tagCEOE_h reasserts 
at phase 2 after the next drive_first cpu cycle if NVAX Plus is stalled in a write probe sequence. 

NOTE 

NOTE :tagCEOE_h and dataCEOE_h may deassert one-phase after the assertion of 
holdack_h whereas the other signal affected by holdack_h are either deasserted or 
tri-stated at the assertion of holdackjn.. 

• ** Systems which use tagOK to obtain access to the cache can assert HoldReq with tagOK 
deasserted in order to have NVAX Plus tri-state adrjh, data_h, check_h, tagCtlV_h, tagCtlS_ 
h, tagCtlDJi, and tagCtlPJi, drives tagCEOE_h, tagCtlWEJi, dataCEOEJi, dataWEJi, and 
dataAJti false, and asserts holdAck_h. This allows system which do not use external muxing 
access to the tag store.** 

The holdReq_h signal is synchronous, and external logic must guarantee setup and hold require- 
ments with respect to the system clock. The holdAck_h signal is synchronous to the CPU clock 
but phase aligned to the system clock, so it can be used as an input to state machines running 
off the system clock. 

The delay from holdReqJh assertion to holdAck_h assertion depends on the programming of 
the external cache interface, and exactly how the system clock is aligned with a pending external 
cache cycle. In the best case the external cache is idle or just about to begin a cycle, and holdAck_ 
h asserts at the same system clock edge that samples the holdReqJh assertion. The worst case 
latency for holdAck^h is three cache access cycles. 

3.2.6.5.2 TagOk 

The fastest way for external logic to gain access to the external caches is to use the tagOk_l 
signal. TagOkJ is an NVAX Plus bus interface control signal that allows external logic to stall 
a CPU cycle on the external cache RAMs at the last possible instant. All tradeoffs surrounding 
the tagOk_l signal have been made in favor of high-performance systems making tagOk_l next 
to impossible to use in low-end systems. 

The tagOk_l signal is synchronous, external logic must guarantee setup and hold requirements 
with respect to the CPU clock. This implies very fast logic, since the CPU clock can run at 200 
MHz for the binned parts. 

The NVAX Plus ARB sequencer enters a stall state if the deassertion of tagOK_l is detected pre- 
venting the completion of a read or write which is in progress. When tagOK_L asserts indicating 
the Bcache is again controlled by NVAX Plus any read or write sequence which was previously 
stalled returns to the first bus cycle of the sequence. For cache reads if either pMapWE<l:0> 
asserts that read is completed. NVAX Plus does not tri-state the busses that run between NVAX 
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Plus and the external cache RAMs( unless HoldReq is asserted). External logic must supply the 
necessary multiplexing functions in the address and data path. 

If the tagOkJ signal is true at the falling edge of the CPU_CLK prior to a cache cycle, the 
external logic is guaranteeing that the tagCtl and tagAdr RAMs were owned by NVAX Plus in 
the previous cache_speed cycles, that the tagCtl RAMs will be owned by NVAX Plus in the next 
cache_speed cycles, that the data RAMs were owned by NVAX Plus in the previous cache_speed 
cycles, and that the data RAMs will be owned by NVAX Plus in the next two cache_speed cycles. 

NVAX Plus samples the tagOk_l signal at the very end of the tag read of an external cache cycle. 
If tagOk_l is true then NVAX Plus knows that no conflict is possible between external logic and 
its cycle. If tagOk_l is false NVAX Plus stalls. NVAX Plus knows that there is some kind of 
conflict (it may have already happened, or it may be going to happen before NVAX Plus can finish 
its cycle). In this case NVAX Plus stalls until tagOk_l is true (at which time all of the above 
assertions are true, which means, in particular, that any address NVAX Plus has been holding on 
the address bus all this time has made it through the external cache RAMs), and then it retries 
any stalled cache references. 

3.2.7 External Cycle Control 

NVAX Plus requests an external cycle when it determines that the cycle it wants to run requires 
module level action. 

An external cycle begins when NVAX Plus puts a cycle type onto the cReq_h outputs. Some cycles 
put an address on the adr_h outputs, and additional information Qow-order address bits, I/D 
stream indication, write masks) on the cWMask_h outputs. All of these outputs are synchronous, 
and NVAX Plus meets setup and hold requirements with respect to the system clock. 

The cycle types are as follows. 
Table 3-7: Cycle Types 



cReq_h[2] cReq_h[l] cReq_h[0] Type 



F 


F 


F 


IDLE 


F 


F 


T 


not generated-BARRIER 


F 


T 


F 


not generated-FETCH 


F 


T 


T 


not generated-FETCHM 


T 


F 


F 


READJ3LOCK 


T 


F 


T 


WRITE.BLOCK 


T 


T 


F 


LDxL 


T 


T 


T 


STxC 



The BARRIER, FETCH and FETCHM cycles are functions generated by EV instructions and are 
not generated in NVAX Plus systems. 

The READ_BLOCK cycle is generated on read misses. External logic reads the addressed block 
from memory and supplies it, 128 bits at a time, to NVAX Plus via the data bus. External logic 
may also write the data into the external cache, after writing a victim if necessary. 
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The WRITE_BLOCK cycle is generated on write misses, and on writes to shared blocks. External 
logic pulls the 128 bits of write data from NVAX Plus via the data bus, and writes the valid 
longwords to memory. The cWMask_h[7..0J signals for NVAX Plus has either cWMask[7..4] = 
0000, or cWMask[3..0] « 0000 during WRITE_BLOCK cycles. If external logic sequences the 
dWSel[l], NVAX Plus drives the same octaword with each dOEJ., and the cWMask bus indicates 
which longwords are valid. Eicternal logic may also write the data into the external cache, after 
writing a victim if necessary. 

The LDxL cycle is generated READ_LOCK microinstruction or for writing byte/word data. The 
cycle works just like a READJBLOCK although the external cache has not been probed (so the 
external logic needs to check for hits), and the address has to be latched into a locked address 
register. 

The STxC cycle is generated by the WRITE_UNLOGK microinstruction and for writes of merged 
byte/word data. The cycle works just like a WRITE_BLOCK, although the external cache has not 
been probed (so that external logic needs to check for hits), and the cycle can be acknowledged 
with a failure status. 

On WRITE_BLOCK and STxC cycles the cWMaskJh pins supply longword write masks to the 
external logic, indicating which longwords in the 32-byte block are, in fact, valid. The cWMask_ 
h[7..0] signals for NVAX Plus has either cWMask[7..4] = 0000, or cWMask[3..0] = 0000 during 
WRITEJBLOCK anc j STxC cycles as NVAX Plus writes at most one octaword per WRITEJBLOCK 
or STxC cycle. A cWMaskJh bit is true if the longword is valid. WRITEJ3LOCK commands can 
have any combination of mask bits set. 

NOTE: For NVAX PLus STxC cycles can have all the mask bits set for the octaword being written, 
where STxC cycles for EV can only have combinations that correspond to a single longword or 
quadword. 

On READ_BLOCK and LDxL cycles the cWMask_h pins have additional information about the 
miss overloaded onto them. The cWMask_h[1..0] pins contain miss address bits [4..3] (indicating 
the address of the quadword that actually missed), which is needed to implement quadword 
read granularity to I/O devices. The cWMask_h[2] pin is true if the address is not I/O space 
and will be filled to P cache. Thus cWMask_h[2] looks like an EV D-stream reference to enable 
system logic to backmap the NVAX Plus mixed I/D stream Pcache with the D-Map backmap. The 
cWMask_h[3] pin is false for references that are targeted to bank 0 of the on-chip Pcache, and 
true for references that are targeted to bank 1 of the on-chip Pcache. The cWMask_h[4] pin is 
true for I-stream references for use by system logic, i.e. possible I-Stream prefetch to memory. 
The cWMask_h[5] pin contains; address bit [2], providing longword information for "PV" mode I/O 
space reads. 

The cycle holds on the external interface until external logic acknowledges it, by placing an 
acknowledgment type on the cAck_h pins. The cAck_h inputs are synchronous, and external 
logic must guarantee setup and hold requirements with respect to the system clock. 

The acknowledgment types are as follows. 
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Table 3-fi: 


Acknowledgment Types 






cAck_h[2] 


cAck_h[l] 


cAck_h[0] 


Type 


F 


F 


F 


IDLE 


F 


F 


T 


HARD_ERROR 


F 


T 


F 


SOFT.ERROR 


F 


T 


T 


STxC_FAIL 


T 


F 


F 


OK 



The HARD_ERROR type indicates that the cycle has failed in some catastrophic manner. NVAX 
Plus latches sufficient state to determine the cause of the error, and machine checks or initiates 
the hard error interrupt. 

The SOFT_ERROR type indicates that a failure occurred during the cycle, but the failure was 
corrected. NVAX Plus latches sufficient state to deteraiine the cause of the error, and initiates a 
soft error interrupt. 

The STxC.FAIL type indicates that a STxC cycle has failed. It is UNDEFINED what happens if 
this type is used on anything but an STxC cycle. 

The OK type indicates success. 

The dRAck_h pins inform NVAX Plus that read data is valid on the data bus, and if ECC checking 
and correction or parity checking should be attempted. NVAX Plus loads Pcache based for non I/O 
space READ_BLOCK and LDxL transactions based on dRAck_h[l]. I/O space references do not 
use dRAck_h[l] and are not allocated to the Pcache. The dRAck_h inputs are synchronous, and 
external logic must guarantee setup and hold requirements with respect to the system clock. If 
dRAck_h is sampled IDLE at a system clock then the data bus is ignored. If dRAck_h is sampled 
non IDLE at a system clock then the data bus is latched at that system clock, and external logic 
must guarantee that the data meets Betup and hold with respect to the system clock. 

The acknowledgment types are as follows. 

Table 3-9: Read Data Acknowledgment Types 



<ffiAck_h[2] dRAck„h[l] dRAckJi[0] Type 

F F F IDLE 

T F F OK_NCACHE_NCHK 

TFT OK.NCACHE 

T T F OK.NCHK 

T T T OK 



The first non IDLE sample of dRAck_h tells NVAX Plus to sample data bytes [15..0], and the 
second non IDLE sample of dRAckJi tells NVAX Plus to sample data bytes [31.. 16]. Normally 
external logic will drive the second dRAck_h and the cAck_h during the same system clock. 
READJBLOCK and LDxL transactions may be terminated with HARD_ERROR status before all 
expected dRAck_h cycles are received. 

It is UNDEFINED what happens if dRAck_h is asserted in a non-read cycle. 
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NVAX Plus checks dRAckJi[0] (the bit that determines if the block is ECC/parity checked) during 
both halves of the 32-byte block. It is legal, but probably not useful, to check only one half of the 
block. 

NVAX Plus checks dRAck_h[l](the bit that determines if a memory reference is to be cached 
or not) during the second half of the 32-byte block. dRack_h[l] is not necessary for 10 space 
references. 10 references are not allocated to Pcache for NVAX. Plus. 

For I/O reads two dRack assertions are expected, however systems (PV) may return a single 
octaword if a cAck is asserted at the same sysClkOut_h edge as the single dRack. 

The dOE_l inputs tells NVAX Plus if it should drive the data bus. It is a synchronous input, 
so external logic must guarantee setup and hold with respect to the system clock. If dOE_l is 
sampled true at a system clock then NVAX Plus drives the data bus at the system clock if it has 
a WRITEJ3LOCK or STxC request pending (the request may already be on the cReq pins, or it 
may appear on the cReq pins at the same system clock edge as the data appears). If dOE_l is 
sampled false at the system clock then NVAX Plus tri-states the data bus on the next system 
clock cycle. The cycle type is factored into the enable so that systems can leave dOE_l asserted 
unless it is necessary to write a victim. 

The dWSel_h inputs of EV are not needed as NVAX Plus only presents 1 octaword to the data 
bus. 

3.2.8 Primary Cache Invalidate 

External logic needs to be able to invalidate primary cache blocks to maintain coherence. NVAX 
Plus provides a mechanism to perform the necessary invalidates, but enforces no policy as to 
when invalidates are needed. Simple systems may choose to invalidate more or less blindly, and 
complex systems may choose to implement elaborate invalidate niters. 

There are two situations where entries in the on-chip Pcache may need to be invalidated. 

The first situation is the obvious one. Any time an external agent updates a block in memory (for 
example, an I/O device does a DMA transfer into memory), and that block has been loaded into 
the external cache, then the external cache block must be either invalidated or updated. If that 
external cache block has been loaded into a block resident in the Pcache then that Pcache entry 
must be invalidated. 

External logic invalidates an entry in bank 0 of the Pcache by asserting the pInvReq_h[0] signal. 
NVAX Plus samples pInvReq_h[0] at every system clock. When NVAX Plus detects pInvReq_h[0] 
asserted, it invalidates the block in bank 0 of the Pcache whose index is on the iAdrJh pins. 

External logic invalidates an entry in bank 1 of the Pcache by asserting the pInvReq_h[l] signal. 
NVAX Plus samples pInvReq_h[l] at every system clock. When NVAX Plus detects pInvReq_h[l] 
asserted, it invalidates the block in bank 1 of the Pcache whose index is on the iAdr_h pins. 

If the Pcache is set to direct map allocation only PinvReq[0] is asserted, iAdr[12] selects the 
section of Pcache to be invalidated. 

**Xt is legal to both pInvReqJa[1..0] in the same cycle.** 
NVAX Plus can accept an invalidate at every system clock. 
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The pInvReq_h[1..0] inputs are synchronous, and external logic must guarantee setup and hold 
with respect to the system clock. The iAdr_h inputs are also synchronous, and external logic 
must guarantee setup and hold with respect to the system clock in any cycle in which any of 
p!nvReq_h[1..0] are true. 

3.2.9 interrupts 

External interrupts are fed to NVAX Plus via the irqji bus. The 6 interrupts are wired to 
IRQ<3:0>, halt, and error. The timer interrupt is internal to NVAX Plus. The interrupts are 
asynchronous, and level sensitive. 

3.2.10 Electrical Level Configuration 

NVAX Plus drives and receives CMOS levels. 
The input circuits do not use the vRef input. 

3.2.11 Testing 

The tristate_l signal, if asserted, causes NVAX Plus to float all of its pins, with the exception of 
the clocks. 

The cont_l signal, if asserted, causes NVAX Plus to connect all of its pins to VSS, with the 
exception of the clocks, vref, dcOk_h, tristate_l, reset_l and cont„l. 

3.3 64-Bit Mode 

NVAX Plus does not support the EV 64-bit external mode. 

3.4 Transactions 
3.4.1 Reset 

External logic resets NVAX Plus by asserting reset.l. When NVAX Plus detects the assertion of 
reset_l it terminates all external activity, and places the output signals on the external interface 
into the following state. Note that all of the control signals have been placed in the state that 
allows external access to the external cache. 

Table 3-1 0: Reset State 



Pin State 

sRomOEJ F 

sRomClk_h T 

adr_h Z 

data_h Z 

check_h Z 
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Table 3-10 (Cont.): Reset State 


Pin 


State 


tagCEOE_h 


F 


tagCtlWE_h 


F 


tagCtlVJi 


z 


tagCtlS.h 


Z 


vCLeZ w ul-i-V LI 


z 


tagCtlPJi 


z 


dataCEOE_h 


F 


dataWE_h 


F 


dataA_h 


F 


holdAck_h 


F 


cReq_h 


FFF 


cWMask_h 


FFFFFFFF 



After asserting reset_l for long enough to reset the serial ROM (100 ns), external logic negates 
reset_l. 

When NVAX Plus detects reset_l negate, it begins internal initialization. When this initialisation 
is completed NVAX Plus microcode asserts sRomOE_l, enabling the output of the serial ROM 
onto sRomD_h, and then determines if the SROM is to be read by reading the SOE-IE IPR which 
contains the state of icMode<0>(sRomFast) at the deassertion of reset. If sRomfast NVAX Plus 
deasserts sRomOEJ and fetches an instruction from address E0040000. If not sRomfast NVAX 
Plus begins clocking bits out of the serial ROM and placing them into the Pcache, The timing is 
the following (assuming NVAX Plus only read 3 bits from the serial ROM). 

reset_l — 
sRomOE_l 
sRomCih_h 
Sample sRomD_h 

Each half-tick of the sRomClk_h signal is 27 CPU cycles long, which guarantees the minimum 
260ns clock high and clock low specifications and the 520ns clock to data specification of the serial 
ROM with 10ns CPU cycles. 

The format for NVAX Plus sROM data is 8 Kbytes of continous data, with the first bit being the 
least significant bit of the first byte of the data. 

At the deassertion of reset, sRomOE_l is not asserted. The high to low transition of of sRomOE__l 
is generated when microcode writes the SOE-IE IPR. This maintains compatability with EV and 
allows sRomOE_l to indicate a reset to sROM bit counters if required. The LNP implementation 
of the sRom is a parallel ROM and discrete shift registers, using reset_l to initialize the bit 
counters. 

After asserting sRomOEJ microcode writes the Pcache TAG IPR Address for pache index 
addr<ll:5> = 0000000 specifying the left bank (address<12>=0) with a tag<31:'12>=00000(hex:) 
and thus validating the 32 byte block of Pcache. Microcode then reads the 32 bits of the sROM 
shifting the bits into a temporary register until a longword is completed. The bits shifted so 
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that the first bit input is the least significant. SIO<serial„lme_out> is hardware cleared at re- 
set. There is an inversion from SIO<seria]_line_out> to the sRomClk_h pin, thus the state of 
sRomClk_h at reset is high. Microcode reads each bit of the sROM by 

1. writing SI 0< serial _Hne_out> with 0 to set sRomClk_h to a high 

2. waiting 27 CPU cycles to insure sRomClk_h is high for 260ns for a 10ns part 

3. writing SIO<serial_lme_out> with 1 to set sRomClk_h to a low 

4. waiting 27 CPU cycles to insure sRomClk.h is low for 260ns for a 10ns part 

5. reading the IPR for SIO<serial_hne_in> 

The sROM uses the high to low transition of sRomClk.h to load it's output register and the low 
to high transition of sRomClkJh to shift to the next bit. Initializing sRomClk_h to a high results 
in the first edge of sRomClk_h being high to low, thus loading the initial ROM outputs to the 
output shift register. Since the low to high transition of sRomClk_h is an input to a shift register, 
the processor loads the the output register and then inputs the first bit before the first shift clock 
edge is driven. 

After the first 32 bits are read, microcode writes the longword to addr<31:0>s=000000000(hex)< 
The write hits in the Pcache and the first longword is written to the Pcache data section. The 
write data is also written through the CBOX. This write will be packed with the next longword 
and be put into the Write Queue. External Write Commands are removed from the Write Queue 
by the Arb Sequencer when sRomOE_l is asserted but are not written to memory, preventing the 
writing of the sROM data. 

The next 32 bits are read. The second longword is then written to addr<31:0>=00000004. The 
next 32 bits are read, the third longword is written to addr<31:0>ss00000008. Longwords 4,5,6,7, 
and 8 are written to address C, 10, 14, 18, and 1C. After the first 8 longwords are written, 
microcode writes the Pcache TAG IPR Address for pache index addr<ll:5> = 0000001 specifying 
the left bank (address<12>=0) with a tag<31:12>s=00000(hex) and thus validating the second 32 
byte block of Pcache. Again 8 longwords are read from the sROM and wriiten to the Pcache block 
with the address being incremented by 4 bytes after each write. After the first 4 kbytes of data 
has been written to the PCache, microcode writes the Pcache TAG IPR Address for pache index 
addr<ll:5> = 0000000 specifying the right bank (address<12>=l) with a tag<31:12>=00001(hex) 
and thus validating the first 32 byte block of Pcache for that bank. The next 4 kbytes are then 
loaded to the right bank with a tag<31:12>=00001(hex). Thus the sROM data is places into NVAX 
Plus Pcache as 

1. Write Pcache TAG IPR. tag<31:12>=00000(hex), bank=0, index=00000 

2. set up initial addr<31:0>=00000000(hex) 

3. read longword from sROM 

4. write longword to addr<31:0> 

5. add 4(hex) to addr<31:0> 

6. if addr<4:2> not 000 repeat step 3 

7. after 8 longword writes addr<4:2>=000, 32 byte block completed, increment index 

8. if index not 000000, bank is not completed, write TAG IPR of next index, go to step 3 

9. if index=000000 and bank=0, set bank=l for second 4 kbyte bank, write TAG IPR, go to step 
3 

10. if index=000000 and bank^l, sROM load is done 
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After completion of the sROM load, microcode initiates a macrocode fetch of the first instruction 
from addr<31:0>=00000000. 



3.4.2 Fast External Cache Read Hit 



A fast external cache read consists of a probe read (overlapped with the first data read), followed 
by a compare cycle and then a second data read. If the probe hits and tagOK_l is asserted and 
HoldReq is deasserted (i.e. no stall) the pMapWE_h of the allocated PCache set is driven. 

The following diagram assumes that the external cache is using 4X cache_ speed timing, chip 
enable control (OE_H/CE_L * L). 



CPU CYCLE |0 |1 |2 |3 14 |5 I 6 

e:pu_clk |0 |i |2 I 3 I 4 |5 16 |7 |8 \9 110 111 I 

phase 24123412341234123412341234 

adr_h i -——------—-——------------——-———— — — — | 

dataA_h[4] I | 

T;agCEOE_h I ——————— | 

tagCtlWE_h 

t.agAdr_fc -ram- 1 

t.agCri_h -ram- 1 

pMapWE~h I I 

dataCEOE_h I — — — — — — ■ I 

dataWE_h 

dat.e_r. -ram-0- I -ram-l-| 

check h -ram-0- 1 -rem- 1-1 



If the probe misses then pMapWE.h does not assert, and the sequence aborts at the end of CPU 
CYCLE 2. 

The address is driven from phase 3 prior to CPU CYCLE 0 and the data is latched at phase 4 
of CPU CYCLE 1, providing 9 phases for external access at cache_speed = 4 times the cpu_clk 
(2CPU CYCLES). 

3.4.3 Fast External Cache Write Hit 

A fast external cache write consists of a probe read, followed by a compare cycle, and then a 
single data write. 

The following diagram assumes that the external cache is using 2X system clock timing, chip 
enable control (OEJH/CE_L t= L), and a 1 cycle write pulse starting from cpu clock falling edge. 



CPU CYCLE 10 |1 12 13 |4 | 5 I 6 

cpu elk |0 II |2 13 14 15 16 |7 |8 19 1 10 1 11 I 

phas« 341224123412341 T 341 2: 341234 

adr_h / 6ataA_h [ 4 j | _-_-.-..___----..-_-----»-_——_—————- | 

t,agCEOE_h |- — — — ] |- | 

t.agCtlWE_h I I 

t,agAdr_h -rain- 1 

tagCti_h -ram- 1 |_ cpu _— — — | 

dataCEOE_h | ■ > I I- — • | 

dataWE_h I I 

data_h I -cp:u———— I 

c.heck_h | - cpu——— I 

If the probe misses then the cycle aborts at the end of cpu clock cycle 3. 
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3.4.4 Fast External Cache Byte/Word Write Hit 

A fast external cache byte/word write consists of a probe read, followed by a compare cycle, a 
data merge cycle, and then a single data write. 

The following diagram assumes that the external cache is using 2X system clock timing, chip 
enable control (OE_H/CE_L = L), and a 1 cycle write pulse starting from cpu clock falling edge. 



Internal Clock 10 II 12 13 14 15 |6 P 1 1 

cpu elk 10 II 12 13 14 15 16 p 18 IS 110 111 | 12 113 114 115 I 

phase 3412 3 41234123412341234123412341234 

adr_h/dataA_h [ 4 j | | 

tagCEOE_h i 1 I 1 

tagCtlWE_h I- -I 

tagAdr_h -ram- I 

tagCtI_h -ram- 1 | -cpu—————— I 

dataCEOE_h I 1 I —I 

dataWE_h |— — — | 

date_h -ram- j I -cpu———— I 

check h -ram- 1 I -cpu—— I 



If the probe misses then the cycle aborts at the end of cpu clock cycle 3. If a correctable ECC 
error occurs on the read data the write is executed delayed from cpu cycles 6 and 7, to cpu cyles 
8 and 9. 



3.4.5 Transfer to SysClk for External tranactions 

The remainder of the transactions described in this chapter, READ.BLOCK, WRITE BLOCK, 
LDxL, and STxC, involve the external system logic, and are described with respect to sysClkOutl. 
This section describes the delay from the internal cpu cycle which initiates a tranction requiring 
external system logic, and SYS_CLK cycle 0, where cReqJi is driven with the command request. 
adr_h and cWMask are valid prior to the start of SYS_CLK cycle 0. 

The NVAX Plus I/O sequencer runs once every CACHE.SPEED cycles. If the output of the I/O 
sequencer initiates a transaction requiring external logic, the cReq_h command is asserted with 
the next rising edge of sysClkOutl Ji. For systems with the CACHE.SPEED and sysClkOut both 
programmed for 2 CPU cycles, the start of the SYS_CLK cycle is always one CPU cycle after the 
I/O sequencer initiated the tranaction. 

CPV CYCLE 10 II 12 13 14 |5 16 |7 ' | 6 

I/O SEQUENCER CYCLE 10 II 12 13 I 

cpu elk 10 II 12 13 14 |5 |6 p 18 IS 110 111 | 12 113 114 | 15 I 

phase 3412341234123412341234123412341234 

EY£_CLK Cycle | 0 I 1 I 2 I 
(2x~sysclkOut) I I I I I I I I 



I + < cReq asserts, EYS_CLK Cycle 0 

I 

+ < i/o sequencer initiates REAC_BLOCK, WRITE BLOCK 

LDxL, STxC. 

If CACHE_SPEED and sysClkOut are not programmed to the same multiple of cpu_clk, the delay 
to the rising edge of sysClkOutl„h and the assertion of cReq_h may be a full SYS_CLK cycle. 
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3.4.6 READ_BLOCK Transaction 

A READJ3LOCK transaction appears at the external interface for reads which miss in the Pcache 
for external cache read misses, either because ithe read really was a miss, or because the external 
cache has not been enabled. 



sys_CLK Cycle 10 111 

sysClkOutl_h I I | I | — 

adr_h |~ 

date_h 1-0- 

checr._h 1-0- 

cRec_h I — ■— — — — . 

cWMash__h |— — — . 

dRkck_h I 

chck h 



•1 I 

■1 I 

1 



0. The READ_BLOCK cycle begins. NVAX Plus places the address of the block containing 
the miss on adr_h. NVAX Plus places the quadword-within-block and the I/D indication on 
cWMask_h. NVAX Plus places a READ_BLOCK command code on cReq_h. The external logic 
detects the command at the end of this cycle. 

1. The external logic obtains the first 16 bytes of data. Although a single stall cycle has been 
shown here, there could be no stall cycles, or many stall cycles. 

2. The external logic has the first 16 bytes of data. It places it on the data_h and eheck_h busses. 
It asserts dRAck_h to tell NVAX Plus that the data and check bit busses are valid. NVAX 
Plus detects dRAck_h at the end of this cycle, and reads in the first 16 bytes of data at the 
same time. 

3. The external logic obtains the second 16 bytes of data. Although a single stall cycle has been 
shown here, there could be no stall cycles, or many stall cycles. 

4. The external logic has the second 16 bytes of data. It places it on the data„h and check_h 
busses. It asserts dRAck_h to tell NVAX Plus that the data and check bit busses are valid. 
NVAX Plus detects dRAck„h at the end of this cycle, and reads in the second 16 bytes of data 
at the same time. In addition, the external logic places an acknowledge code on cAck_h to tell 
NVAX Plus that the READ_BLOCK cycle is completed. NVAX Plus detects the acknowledge 
at the end of this cycle. The address remains in the cycles after cAck as NVAX Plus fills 
Pcache. 

5. Everything is idle on the EDAL. NVAX Plus moves fill data to MB OX. A new external cache 
cycle does not start until the fill is completed, dataceoe are asserted 1 cpu cycle after cAck is 
recognized by the ARB sequencer. 

Note that this picture did not mention the external caches. NVAX Plus drove all of the external 
cache control signals false when it placed the READ_BLOCK command on the cReqJa outputs. 
The external logic controls the updating of cache. 

NVAX Plus performs ECC checking and correction (or parity checking) on the data supplied to 
it via the data and check busses if so requested by the acknowledge code. It is not necessary to 
place data into the external cache to get checking and correction. 
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3.4.7 Write Block 

A WRITE_BLOCK transaction appears at the external interface on external cache write misses 
(either because it really was a miss, or because the external cache has not been enabled (or the 
system is "PV), or on external cache write hits to shared blocks. 



SYE_CLK Cycle 
sysClkOut_h 
adr_h 
data_h 

ch«ck_h (not PV) 

cReq_h 

cWMasi:_b 

dOE_I 

cAck h 



0 


1 


! 2 1 

| | | 


5 1 A 
1 1 


1 S 










- 1-0 


















1 nnnnnnn | 




- 1 
- 1 



0. The WRITE_BLOCK cycle begins. NVAX Plus places the address of the block on adrji. NVAX 
Plus places the longword valid masks on cWMaskJi. NVAX Plus only write a single octaword 
at a time, thus cWMask[7:4] m '0000 if adrji[4] = '0 or cWMask[3:0] = '0000 if adr_h[4] = 
'1. The dWsel_h from EV are. not needed as NVAX Plus drives the same octaword at the 
assertion of dOE_l. 

1. NVAX Plus places the WRITE_BLOCK command code on cReq_h. The external logic detects 
the command at the end of this cycle. 

2. The external logic detects the command, and asserts dOEJ to tell NVAX Plus to drive the 16 
bytes of data of the block onto the data bus. Since NVAX Plus only writes a single octaword 
the write_block can be cAck in the same cycle in which is driven. Systems which choose 
to handle write__blocks the same for EVAX and NVAX Plus will continue the sequence with 
NVAX Plus driving out the same octaword of data. NVAX Plus continues to drive the data in 
the system cycle following cack (if dOE_l) providing data hold time. Although a single stall 
cycle has been shown here, there could be no stall cycles, or many stall cycles. 

3. If the external logic asserts dOEJ a Becond time to tell NVAX Plus to drive a second 16 bytes 
of data onto the data bus the same octaword is driven. 

4. The external logic places an acknowledge code on cAck.h to tell NVAX Plus that the WRITE_ 
BLOCK cycle is completed. NVAX Plus detects the acknowledge at the end of this cycle. NVAX 
Plus holds the address till the cAck is recognized by the ARB sequencer and a subsequent 
bus operation is dispatched. 

5. Everything is idle. 

Note that this picture did not mention the external caches. NVAX Plus drove all of the external 
cache control signals false when it placed the WRITEJ3LOCK command on the cReq_h outputs. 
The external logic controls the updating of cache. 

NVAX Plus performs ECC generation (or parity generation) on data it drives onto the data bus. 
The check_h lines remain tristated for "PV systems. 
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3.4.8 LDxL Transaction 

An LDxL transaction appears; at the external interface as a result of a READ_LOCK micro- 
instruction or byte/word write which misses in the B Cache being executed. The external cache 
is not probed. 

SY£_CLK Cycle |0|1|2|3|4|5| 

sysClkOut_h f 1 | 1 | \ | 1 I 1 I- 

adr_b ~ | — | 

data_h 1-0 1 — | 

check_h 1-0 I -| 

cR«c_h |——__.._——.-————,__—__«—- 1 

cWMas):_h t— — — — — — — — — — — — — — — |, 

dRAck_h I 1 I — 1 

eAck_h I — — — | 

0. The LDxL cycle begins. NVAX Plus places the address of the block containing the data on 
adr_h. NVAX Plus places the quadword-within-block and the I/D indication on cWMask_h. 
LDxL cycles for byte/word writes indicate I so that system logic does not enter the block into 
the backmap. NVAX Plus places a LDxL command code on cReq_h. The external logic detects 
the command at the end of this cycle. 

1. The external logic obtains the first 16 bytes of data. Although a single stall cycle has been 
shown here, there could be no stall cycles, or many stall cycles. 

2. The external logic has the iirst 16 bytes of data. It places it on the data_h and check_h busses. 
It asserts dRAck.h to tell NVAX Plus that the data and check bit busses are valid. NVAX 
Plus detects dRAck_h at the end of this cycle, and read in the first 16 bytes of data at the 
same time. 

3. The external logic obtains the second 16 bytes of data. Although a single stall cycle has been 
shown here, there could be no stall cycles, or many stall cycles. 

4. The external logic has the second 16 bytes of data. It places it on the data„h and check_h 
busses. It asserts dRAck_h to tell NVAX Plus that the data and check bit busses are valid. 
NVAX Plus detects dRAck„h at the end of this cycle, and read in the second 16 bytes of data 
at the same time. In addition, the external logic places an acknowledge code on cAck_h to 
tell NVAX Plus that the LDxL cycle is completed. NVAX Plus detects the acknowledge at the 
end of this cycle, the address holds while the data is either being loaded to Pcache or merged 
for a STxC to complete the byte/word write sequence. 

5. Everything is idle. 

Note that with the exception of the command code output on the cReq pins, the LDxL cycle is the 
same as a READJBLOCK cycle. 

3.4.9 STxC Transaction 

An STxC transaction appears at the external interface as a result of a WRITE_UNLOCK micro_ 
instruction or byte/word write in which the initial read probe missed in the BCache. The external 
cache is not probed. 



DIGITAL CONFIDENTIAL 



External Interface 3—23 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 



EY£_CLK Cycle 
sysClkOut_h 



0 



1 



2 



3 

I 



I 5 



adr_h 
data h 




check_h(not PV) 
cRec_h 
cWMask h 



dO£_i 
CAck h 



nnnnnnn 



0. The STxC cycle begins. NVAX Plus places the address of the block on adr_h. NVAX Plus 
places the longword valid masks on cWMask_Jh. NVAX Plus places an STxC command code 
on cReq_h. The external logic detects the command at the end of this cycle. 

1. The external logic detects the command, and asserts dOE_l to tell NVAX Plus to drive the 16 
bytes of the block onto the data bus. 

2. NVAX Plus drives 16 bytes of write data onto the data„h and check_h busses, and the external 
logic writes it into the destination. Since NVAX Plus only writes a single octaword the write, 
block can be cAck in the same cycle in which is driven. Systems which choose to handle 
write_blocks the same for EVAX and NVAX Plus will continue the sequence with NVAX Plus 
driving out the same octaword of data. NVAX Plus continues to drive the data in the system 
cycle following cack (if dOE_l) providing data hold time. Although a single stall cycle has 
been shown here, there could be no stall cycles, or many stall cycles. 

3. The external logic asserts dOEJ and dWSeLh to tell NVAX Plus to drive the second 16 bytes 
of data onto the data bus. NVAX continues to drive the same octaword of data. The cWMask_ 
h output indicates which octaword contains the write data. 

4. NVAX Plus drives the same octaword of write data onto the data_h and check_h busses, and 
the external logic writes it into the destination. Although a single stall cycle has been shown 
here, there could be no stall cycles, or many stall cycles. In addition, the external logic places 
an acknowledge code on cAck_h to tell NVAX Plus that the STxC cycle is completed. NVAX 
Plus detects the acknowledge at the end of this cycle. NVAX Plus holds the address till the 
cAck is recognized by the ARB sequencer and a subsequent bus operation is dispatched. 

5. Everything is idle. 

Note that with the exception of the code output on the cReq pins, and the fact that external logic 
has the option of making the cycle fail by using a cAck code of STxC_FAIL, the STxC cycle is the 
same as the WRITE_BLOCK cycle. 

3.4.10 BARRIER Transaction 

NVAX Plus does not generate the BARRIER transaction. 

3.4.11 FETCH Transaction 

NVAX Plus does not generate the FETCH transaction. 

3.4.1 2 FETCHM Transaction 

NVAX Plus does not generate the FETCHM transaction. 
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3.5 Summary of NVAX Plus options 

The NVAX Plus chip can be used in system platforms intended for the EV processor chip (LASER, 
COBRA, Flamingo). In addition NVAX. Plus has an optional mode "PV" for use in systems in which 
NVAX Plus is a replacement for the Mariah CPU. This section summarizes the key features which 
are implemented by the NVAX Plus chip pertaining to system configuration. 

3.5.1 System Clock Divisors 

The sysClkOut period, the number of CPU cycles per sysClkOut cycle, is determined from IRQ 
lines at reset. 

• 2X 

• 3X ASYMMETRIC (COBRA) 

• 4X SYMMETRIC CLOCK >40NS PERIOD FOR FLAMINGO 

3.5.2 Cache Access 

The Cache access time can be set to 2,3, OR 4 CPU cycles, from BIU_CTL<BC_SPD>. 

3.5.3 Flamingo I/O Address Mapping 

I/O space addresses can be mapped to Flamingo 'sparse' and 'dense' space by setting BIU_ 
CTL[WSJO]. 

3.5.4 Direct Mapped Pcache 

The NVAX Plus chip can support a two-way set associative or direct-mapped Pcache as selected 
from B IU_ CTL<PCACHE_,M ODE > . This allows systems to backmap the Pcache exactly as the 
Dcache for EV by selecting the direct-mapped option. When the direct-mapped option is selected 
allocate to a Pcache bank are based on address<12> instead of allocate bit. lb support the direct- 
mapped option the MB OX allocates fills to the bank Pcache bank selected by the Miss latch 
latch for two-way associative operation and address<12> for direct-mapped operation. In direct- 
mapped mode the CBOX sends an invalidate request to the MB OX for bank 0 if iAdr<12> = 0, 
and sends an invalidate request to the MBOX for bank 1 if iAdr<12> = 1. 

3.5.5 adr_h<33:32> 

adr<33:32> for I/O space references is selected from BIU_CTL<14:13>. I/O space for LASER 
systems requires adr_h<33:32>=sll, for COBRA systems adr_h<33:32>=10, and for Flamingo sys- 
tems adr_h<33:32>s=01. The BIU_CTL register field allows for 10 space mapping of different 
systems. 
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3.5.6 QW I/O WRITES/MTPR MAILBOX 



Writes to the LMBPR require more than 32 bits, i.e. bits <39:32> = 00000000. In order to pack 
more than a long-word to an I/O space a "pack_even_for_I/0" function can be enabled by writing 
to IPR B8. This function can be disabled by a subsequent write to IPR B9. For the MTPR 
MAILBOX instruction, the write to the LMBPR is done under microcode control. IPR B8 is 
written to enable to I/O space quadword packing. Two longwords which make up the MB_ADDR 
(address of mailbox data structure) are then written. IPR B9 is written to clear the I/O packing 
function. 

The I/O pack function can be enabled with a MTPR B8 and can be disabled with a MTPR B9. For 
writes to I/O space other than to the LMBPR where a quadword write is required (e.g. COBRA 
systems) use the following macrocode sequence while in kernel mode. 

• MFPR #PR$ JPL,-(SP) 

• MTPR #31,#PR$JPL 

• MTPR #0,enable_io_pack 

• MOVQ R,y 

• MTPR #0,disable_io_pack 

• MTPR (SP)+,#PR$_IPL 

The following restrictions need to be met to write quadword IO. 

1. The source mode for the MOVQ to IO space transaction must be register 

2. The MOVQ and MTPR B9 must be aligned to a 32-byte block 

3. The MOVQ destination must be quadword aligned 

4. The page where the quadword I/O is to be written cannot encounter an ACV or TNV memory 
management exception. (A TB miss is allowed) 

3.5.7 QW I/O READS 

For systems which contain quadword CSRs (Control Status Register) in I/O space (COBRA), a single 
quadword read is necessary in order to obtain consistent data for the CSR. When **BIU_CTL<QW„ 
IO_RD> = 1** , 

1. a the high_LW register is loaded with data<63:32> of any I/O read 

2. I/O reads with address<2> « '1 (not QW aligned) are converted to an IPRJRD of the highJLW 
register and data returns on dat<31:0> 



3.5.8 PV mode 

PV mode supports write- through caching and byte writes. 

Write-through caching is supported by having writes not write B cache directly. 

• the ARB sequencer dispatches directly to 'SYS_WR' if "PV" mode 

• checkjh<27:0> output drivers remain tristated for writes, parity/ecc not needed on "PV" 
writes; PV system logic must generate byte parity. 
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PV mode supports byte writes, cWMask_h drives the byte mask instead of a longword mask. 

• dataA_h<3> indicates for which QW the cWMaskJh lines are the byte mask 

• dataWE<l:0> contain byte mask information! for the QW not addressed by dataA_h<3> 

Other features of PV mode 

• on reads combine byte parity on check bits into LW parity, by providing xor tree for 4 check 
bits for each LW being input, for conversion into single LW parity bit 

• address<2> ->cWMask<5>; needed to specify 10 space read addresses to the LW 

• dataA_h[4] tristates on read_block/LD_LK enabling PV system to control octaword address 
for B cache fills. 

• PV systems can respond to I/O space reads with a single dRack provided cAck is also sent at 
the same sysClkOut 

• supports byte/word write to I/O space within same LW address 
3.6 Revision History 



Table 3-11: 


Revision History 




Who 


When 


Description of change 


Gil Wolrich 


15-Nov-1990 


NVAX PLUS release for external review. 


Gil Wolrich 


15-Jan-1991 


Remove Vector references/update. 


Gil Wolrich 


3-Apr-1991 


Include FV options/update. 


Gil Wolrich 


l-Aug-1991 


update. 
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Chapter 4 
Chip Overview 



4.1 NVAX Plus CPU Chip Box and Section Overview 

The NVAX Plus CPU Chip is a single-chip CMOS-4 macropipelined implementation of the base 
instruction group, and the optional vector instruction group of the VAX architecture. Included in 
the chip are: 

• CPU: Instruction fetch and decode, microsequencer, and execution unit 

• Control Store: 1600, 61 -bit micro words 

• Primary Cache: 8 KB, 2- way set associative, physically-addressed, write through, mixed 
instruction and data stream 

• Instruction Cache: 2 KB, direct-mapped, virtually addressed, instruction stream only 

• Translation Buffer: 96 entries, fully associative 

• Floating Point: 4 stage, pipelined, integrated floating point unit 

• EDAL Interface: Support for six cache sizes (4MB, 2MB, 1MB, 512KB, 256KB, 128KB), 
and four RAM speeds. 

The NVAX chip is designed in CMOS-4 with a typical cycle time of 14 ns, and with the option of 
running chips at a slower or faster cycle time. The chip can be incorporated into many different 
system environments, ranging from the desktop to the midrange, and from single processor to 
multiprocessor systems. 

The NVAX is a macropipelined design: it pipelines macroinstruction decode and operand fetch 
with macroinstruction execution. Pipeline efficiency is increased by queuing up instruction infor- 
mation and operand values for later use by the execution unit. Thus, when the macropipeline is 
running smoothly, the Ibox (instruction parser/operand fetcher) is running several macroinstruc- 
tions ahead of the Ebox (execution unit). Outstanding writes to registers or memory locations are 
kept in a scoreboard to ensure that data is not read before it has been written. See Chapter 5 
for a more in-depth discussion of the macropipeline. 

This chapter gives an overview of the different sections, or "boxes", that comprise the NVAX Plus 
CPU. For more information on any of the boxes, please see the appropriate chapters within this 
specification. Figure 4—1 is a block diagram of the boxes, and the major buses that run between 
them. 
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Figure 4-1: NVAX Plus CPU Block Diagram 
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4.1.1 The Ibox 

The Ibox decodes VAX instructions and parses operand specifiers. Instruction control, such as 
the control store dispatch address, is then placed in the instruction queue for later use by the 
Microsequencer and Ebox. The Ibox processes the operand specifiers at a rate of one specifier per 
cycle and, as necessary, initiates specifier memory read operations. All the information needed 
to access the specifiers is queued in the source queue and destination queue in the Ebox. 

The Ibox prefetches instruction stream data into the prefetch queue (PFQ), which can hold 16 
bytes. The Ibox has a dedicated instruction-stream-only cache, called the virtual instruction cache 
(VIC). The VIC is a 2 KB, with a block and fill size of 32 bytes. 

The Ibox has both read and write ports to the GPR and MD portions of the Ebox register file 
which are used to process the operand specifiers. The Ibox maintains a scoreboard to ensure that 
reads and writes to the register file are always performed in synchronization with the Ebox. The 
Ibox stops processing instructions and operands upon issuing certain complex instructions (for 
example, CALL, RET, and character string instructions). This is done to maintain read/write 
ordering when the Ebox will be altering large amounts of VAX state. 
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Since the Ibox is often parsing several macroinstructions ahead of the Ebox, the correct value 
for the PSL condition codes is not known at the time the Ibox executes a conditional branch 
instruction. Rather than emptying the pipe, the Ibox predicts which direction the branch will 
take, and passes this information on to the Ebox via the branch queue. The Ebox later signals 
if there was a misprediction, and the hardware backs out of the path. The branch prediction 
algorithm utilizes a 5 12-entry RAM, which caches four bits of branch history per entry. 

4.1.2 The Ebox and Microsequencer 

The Ebox and Microsequencer work together to perform the actual "work" of the VAX instructions. 
Together they implement a four stage micropipelined unit, which has the ability to stall and to 
micro trap. The Ebox and Microsequencer dequeue instruction and operand information provided 
by the Ibox via the instruction queue, the source queue, and the destination queue. For literal type 
operands, the source queue contains the actual operand value. In the case of register, memory, 
and immediate type operands, the source queue holds a pointer to the data in the Ebox register 
file. The contents of memory operands are provided by the Mbox based on earlier requests from 
the Ibox. GPR results are written directly back to the register file. Memory results are sent to 
the Mbox, where the data will be matched with the appropriate specifier address previously sent 
by the Ibox. At times, the Ebox initiates its own memory reads and writes using E%VAJBUS and 
E%WBUS. 

The Microsequencer determines the next microword to be fetched from the control store. It 
then provides this cycle-by-cycle control to the Ebox. The Microsequencer allows for eight-way 
microbranches, and for microsubroutines to a depth of six. 

The Ebox contains a five-port register file, which holds the VAX GPRs, six Memory Data Registers 
CMDs), six microcode working registers, and ten miscellaneous CPU state registers. It also con- 
tains an ALU, a shifter, and the VAX PSL. The Ebox uses the RMUX, controlled by the retire 
queue, to order the completion of Ebox and Fbox instructions. As the Ebox and the Fbox are 
distinct hardware resources, there is some amount of execution overlap allowed between the two 
units. 

The Ebox implements specialized hardware features in order to speed the execution of certain 
"VAX instructions: the population counter (CALLx, PUSHR, POPR), and the mask processing unit 
(CALLx, RET, FFx, PUSHR, POPR). The Ebox also has logic to gather hardware and software 
interrupt requests, and to notify the Microsequencer of pending interrupts. 

4.1.3 The Fbox 

The Fbox implements a four staged pipelined execution unit for the floating point and integer 
multiply instructions. Operands are supplied by the Ebox up to 64 bits per cycle on E%ABUS and 
E%BBUS. Results are returned to the Ebox 32 bits per cycle on F%RESULT. The Ebox is responsible 
for storing the Fbox result in memory or the GPRs. 
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4.1.4 The Mbox 

The Mbox receives read requests from the Ibox (both instruction stream and data stream) and 
from the Ebox (data stream only). It receives write/store requests from the Ebox. Also, the Cbox 
sends the Mbox fill data and invalidates for the Pcache. The Mbox arbitrates between these 
requesters, and queues requests which cannot currently be handled. Once a request is started, 
the Mbox performs address translation and cache lookup in two cycles, assuming there are no 
misses or other delays. The two-cycle Mbox operation is pipelined. 

The Mbox uses the translation buffer (96 fully associative entries) to map virtual to physical 
addresses. In the case of a TB miss, the memory management hardware in the Mbox will read 
the page table entry and fill the TB. The Mbox is also responsible for all access checks, TNV 
checks, M-bit checks, and quadword unaligned data processing. 

The Mbox houses the Primary Cache (Pcache). The Pcache is 8KB, writethrough, with a block 
and fill size of 32 bytes. 

The Pcache can be configured at reset to be either direct mapped or 2-way set associative. 

The Pcache state is maintained as a subset of the Backup Cache. System logic, possibly using 
backmaps, is responsible for insuring the Pcache is maintained as a subset of the Backup Cache. 

The Mbox ensures that Ibox specifier reads are ordered correctly with respect to Ebox specifier 
stores. This memory "scoreboarding" is accomplished by using the PA queue, a small list of 
physical addresses which have a pending Ebox store. 

4.1.5 The Cbox 

The Cbox initiates access to the second level cache (the Backup Cache, or Bcache), and issues 
memory requests. Both the tags and data for the Bcache are stored in off-chip RAMs. The size and 
access time of the Bcache RAMs can be configured as needed by different system environments. 
The Bcache sizes supported are 4 MB, 2 MB, 1 MB, 512 KB, 256 KB, and 128 KB. System logic 
is responsible for B Cache fills and coherency functions. The Cbox packs sequential writes to the 
same octaword in order to minimize Bcache write accesses. Multiple write commands are held 
in the eight-entry WRITE.QUEUE. 

4.1.6 Major Interna! Buses 

This is a list of the major interbox buses: 

• B%S6_DATAj 

This bidirectional bus between the Cbox and MBox is used to transfer write data to the backup 
cache, to to transfer fill data to the primary cache. 

• C%CBOX_ADDR: 

This bus is used to transfer the physical address of a Pcache invalidate from the Cbox to the 
MBox. 

• E%ABUS, E%BBUS: 

These two 32-bit buses contain the A- and B-port operands for the Ebox, and are also used 
to transfer operand data to the Fbox. 

• E%JBOX_IA w BUS: 

This bus is used by the Ibox to read the Ebox Register File in order to perform an operand 
access. An example is to read a register's contents for a register deferred type specifier. 
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• E%DQ_RETTRE*: 

This collection of related buses transfers information from the Ebox to the Ibox when a des- 
tination queue entry is retired. 

• E%SQ_RETTRE* : 

This collection of related buses transfers information from the Ebox to the Ibox when a source 
queue entry is retired. 

• E%VA_BUS: 

This bus transfers an address from the Ebox to the MBox. 

• E%WBUS: 

This 32-bit bus transfers write data from the RMUX to the register file and the Mbox. 

• E_USQ_CSM9atfEB: 

This bus carries Control Store data from the Microsequencer to the Ebox. 

• E_BUS%UTEST: 

This 3-bit bus transfers microbranch conditions from the Ebox to the microsequencer. 

• F%RESULT: 

This bus is used to transfer results from the Fbox to the Ebox. 

• I%IBOX_ADDR: 

This bus transmits the virtual address of an Ibox memory reference to the Mbox. The address 
may be for instruction prefetch or an operand access. 

• I%IQJBUS: 

This bus carries instruction information from the Ibox to the Instruction Queue in the 
Microsequencer. 

• I%IBOXJW„RUS: 

This bus is used by the Ibox to write the Ebox Register File for autoincrement/decrement type 
specifiers and to deliver immediate operands to the Register File. 

• I%OPERAND„BUS: 

This bus transfers information from the Ibox to the source and destination queues in the 
Ebox. 

• M<JbMDJBUS: 

The bus returns right-justified memory read data from the Mbox to either the Ibox (64 bits) 
or the Ebox (32 bits). 

• M%S6_PA: 

This bus transfers the address for a backup cache reference from the MBox to the Cbox. 
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Table 4-1 : Revision History 




Description of change 



Debra Bernstein 



06-Mar-1989 



Release for external review. 



Mike Uhler 



18-Dec-1989 



Update for second-pass release. 
Update for NVAX Plus external release. 



Gil Wolrich 



15-Nov-1990 
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Chapter 5 

Macroinstruction and Microinstruction Pipelines 



5.1 Introduction 

This chapter discusses the architecture of the NVAX Plus CPU macroinstruction and microin- 
struction pipeline. It includes a section of general pipeline fundamentals to set the stage for the 
specific NVAX Plus CPU implementation of the pipeline. This is followed by an overview of the 
NVAX Plus CPU pipeline, an examination of macroinstruction execution, and a discussion of stall 
and exception handling from the viewpoint of the Ebox. 

5.2 Pipeline Fundamentals 

This section discusses the fundamentals of instruction pipelining in a general manner that is 
independent of the NVAX Plus CPU implementation. It is intended as a primer for those readers 
who do not understand the concept and implications of instruction pipelining. Readers familiar 
with this material are encouraged to skip (or at most skim), this section. 

5.2.1 The Concept of a Pipeline 

The execution of a VAX macroinstruction involves a sequence of steps which are carried out 
in order to complete the macroinstruction operation. Among these steps are: instruction fetch, 
instruction decode, specifier evaluation and operand fetch, instruction execution, and result store. 
On the simplest machines, these steps are carried out sequentially, with no overlap of the steps, 
as shown in Figure 5—1. 



DIGITAL CONFIDENTIAL 



Macroinstruction and Microinstruction Pipelines 5—1 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 
Figure 5-1 : Non-Pipelined Instruction Execution 



....... — — Time ———.———> 

Instruction 1 ISO | £1 IS2 I S3 |S4 I E5 | S€ | 

-_._.-.--_4 

+_—__—__—_-._.___ + 

Instruction 2 I SO 1 51 1 £2 I 53 | S4 I £5 1 56 I 

Instruction 3 ISO I £1 1 S2 153 I S4 I £5 |S6 I 



In this diagram, "SO", "S2", "S6" denote particular steps in the execution of an instruction. 
For this simple scheme, all of the steps for one instruction are performed, and the instruction is 
completed, before any of the steps for the next instruction are started. 

In more complex machines, one or more steps of the execution process are carried out in parallel 
with other steps. For example, consider Figure 5—2. 

Figure 5-2: Partially-Pipelined Instruction Execution 



Time -—-——«-—-> 
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+ .... 
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Instruction 1 
Instruction 2 



Instruction 3 



In this example, step S6 of each instruction is overlapped in time (or executed in parallel) with 
step SO of the next instruction. In doing so, the number of instructions executed per unit time 
(instruction throughput) goes up because an instruction appears to take less time to complete. 

In the most complex machines, most (or all) of the steps are executed in parallel as indicated in 
Figure 5—3. 
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Fully-Pipelined Instruction Execution 
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In this example every step of instruction execution is performed in parallel with every other step. 
This means that a new instruction is started as soon as step SO is completed for the previous 
instruction. If each step, S0..S6, took the same amount of time, the apparent instruction through- 
put would be seven times greater than that of Figure 5-1 above, even though each instruction 
takes the same amount of time to execute in both cases. 

Figures 5—2 and 5-3 are examples of the concept of instruction pipelining, in which one or 
more steps necessary to execute an instruction are performed in parallel with steps for other 
instructions. 



5.2.2 Pipeline Flow 

A real-world form of a pipeline is an automobile assembly line. At each station of the assembly 
line (called segments of the pipeline in our case), a task is performed "on the partially completed 
automobile and the result is passed on to the next station. At the end of the assembly line, the 
automobile is complete. 

In an instruction pipeline, as in an assembly line, each segment is responsible for performing a 
task and passing the completed result to the next segment. The exact task to be performed in 
each pipeline segment is a function of the degree of pipelining implemented and the complexity 
of the instruction set. 

One attribute of an automobile assembly line is equally important to an instruction pipeline: 
smooth and continuous flow. An automobile assembly line works well because the tasks to be 
performed at each station take about the same amount of time. This keeps the line moving at a 
constant pace, with no starts and stops which would reduce the number of completed automobiles 
per unit time. 

An analogous situation exists in an instruction pipeline. In order to achieve real efficiency in 
an instruction pipeline, information must flow smoothly and continuously from the start of the 
pipeline to the end. If a pipeline segment somewhere in the middle is not able to supply results 
to the next segment of the pipeline, the entire pipeline after the offending segment must stop, or 
stall, until the segment can supply a result. 

In the general case, a pipeline stall results when a pipeline segment can not supply a result to 
the next segment, or when it can not accept a new result from a previous segment. 
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This is a fundamental problem with most instruction pipelines because they occasionally (or not 
so occasionally) stall. Stalls result in decreased instruction throughput because the smooth flow 
of the pipeline is broken. 

A typical example of a pipeline stall involves memory reads. A simple three-segment pipeline 
might fetch operands in segment 1, use the operands to compute results in segment 2, and make 
memory references or store results in segment 3, as shown in Figure 5-4. 

Figure 5-4: Simple Three-Segment Pipeline 



T-— .—..—.4 4_— _-_.._.-4 

I Operand I -> I Computation |-> I Memory I 
I Access II II Read I 

— — — — — — — 4 4— —.——4 



Figure 5-5 illustrates what happens when the pipeline control wants to use the result of the 
memory read as an operand. 

Figure 5-5: Information Flow Against the Pipeline 



Operand I -> IComputation |-> I Memory 
Access || || Read 



12 + >| Operand I -> IComputation |-> | Result I 

I Access ll || Store | 



In this case, the operand access segment of 12 can not supply an operand to the computation 
segment because the memory read done by II has not yet completed. As a result, the pipeline 
must stall until the memory read has completed. This is shown in Figure 5-6. 



Figure 5-6: Stalls introduced by Backward Pipeline Flow 



+...........4 4...... ......4 4.___-.____.4 
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I 4-._....._..4 ..... .. ..4 +.—..___... 4 
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I I II II I 

I 4— .... ..... 4 +____.._. —4 +___— _-_._4 

I +————4 4— ———4 +-.-.— —.-4 
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I Access II || Store I 
+...-.-... —4 4 ___4 
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In this diagram, the memory read data from II is not available until the read request passes 
through segment 3 of the pipeline. But the operand access segment for 12 wants the data im- 
mediately. The result is that the operand access segment of 12 has to stall twice waiting for the 
memory read data to become available. This, in turn, stalls the rest of the pipeline segments 
after the operand access segment. 

This situation is an excellent example of an age-old problem with instruction pipelining. The 
natural and desired direction of information flow in a pipeline is from left to right in the above 
diagrams. In this case, information must flow from the output of the memory read segment into 
the operand access segment. This requires a right-to-left movement of information from a later 
pipeline segment to an earlier one. In general, any information transfer which goes against the 
normal flow of the pipeline has the potential for causing pipeline stalls. 

5.2.3 Stalls and Exceptions in an Instruction Pipeline 

Even the best pipeline design must be prepared to deal with stalls and exceptions created in the 
pipeline. As mentioned above, a stall is a condition in which a pipeline segment can not accept 
a new result from a previous segment, or can not send a result to a new segment. An exception 
occurs when a pipeline segment detects an abnormal condition which must stop, and then drain 
the pipeline. Examples of exceptions are: memory management faults, reserved operand faults, 
and arithmetic overflows. One of the inherent costs of a pipelined implementation is the extra 
logic necessary to deal with stalls and exceptions. 

There are two primary considerations concerning stalls: what action to take when one occurs, 
and how to minimize them in the first place. The design of most instruction pipelines assumes 
that the pipeline will not stall, and handles the stall condition as a special case, rather than 
the other way around. This means that each segment of the pipeline performs its function and 
produces a result each cycle. If a stall occurs just before the end of the cycle, the segment must 
block global state updates. and repeat the same operation during the next cycle. The design of 
the pipeline control must take this into account and be prepared to handle the condition. 

A common stall condition occurs when each pipeline segment has the same average speed, but 
different peak speeds. For example, a pipeline segment whose task is to perform both memory 
references and register result stores may take longer to perform memory references than result 
stores. This can cause earlier segments of the pipeline to stall because the segment can not 
take new inputs as fast if it is doing a memory reference rather than a result store. A common 
technique to minimize this problem is to place buffers between pipeline segments, as shown in 
Figure 5-7. 

Figure 5-7: Buffers Between Pipeline Segments 



4 — .4 4 ^.-.—.....—. + 4——+ +— --——4 

I Operand I -> I Buffer I -> I Computation I -> | Buffer I -> | Memory | 
I Access || || || || Read I 
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By placing a buffer of sufficient depth between each segment of the pipeline, segments of differing 
peak speeds can avoid stalls caused if the next segment is unable to accept a new result. Instead, 
the result goes into the inter-segment buffer and the next segment removes it from the buffer 
when it needs it. Unfortunately, adding such buffers means that additional logic must also be 
added to handle the buffer full/buffer empty conditions. 
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The performance advantage of an instruction pipeline comes from the parallelism built into the 
pipeline. If the parallelism is defeated by, for example, a stall, the advantage starts to drop. One 
problem associated with pipelines is that they can provide "lumpy" performance. That is, two 
similar programs may experience radically different performance if one causes many more stalls 
(which defeat the parallelism of the pipeline) than the other. 

Pipeline exceptions are different from stalls in that exceptions cause the pipeline to empty or 
drain. Usually, everything that entered the pipeline before the point of error is allowed to com- 
plete. Everything that entered the pipeline after the point of error is prevented from completing. 
This can add considerable complexity to the pipeline control. 

A larger problem occurs when the designer wants exceptions to be recoverable. Consider an 
exception caused by a memory management fault. On the VAX, this condition can occur because 
of a TB miss. The correct response to this fault is to read a PTE from memory, refill the TB, and 
restart the request that caused the fault. This can add considerable complexity to the design. 

5.3 NVAX Plus CPU Pipeline Overview 

The remainder of this chapter discusses the NVAX Plus CPU pipeline, which is shown as a block 
diagram in Figure 5-8. This is a high-level view of the CPU and abstracts many of the details. 
For a more detailed view of the pipeline, users are encouraged to refer to the individual box 
chapters in this specification. 

The pipeline is divided into seven segments denoted as "SO" through "S6". In Figure 5—8, the 
components of each section of the CPU are shown in the segment of the pipeline in which they 
operate. 

The NVAX Plus CPU is fully pipelined and, as such, is most similar to the abstract example 
shown in Figure 5-3. In addition to the overall macroinstruction pipeline, in which multiple 
macroinstructions are processed in the various segments of the pipeline, most of the sections also 
micropipeline operations. That is, if more than one operation is required to process a macroin- 
struction, the multiple operations are also pipelined within a section. 

5.3.1 Normal Macroinstruction Execution 

Execution of macroinstructions in the NVAX pipeline is decomposed into many smaller steps 
which are the distributed responsibility of the various sections of the chip. Because the NVAX 
Plus CPU implements a macroinstruction pipeline, each section is relatively autonomous, with 
queues inserted between the sections to normalize the processing rates of each section. 

5.3.1.1 Thelbox 

The Ibox is responsible for fetching instruction stream data for the next instruction, decomposing 
the data into opcode and specifiers, and evaluating the specifiers with the goal of prefetching 
operands to support Ebox execution of the instruction. 
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The Ibox is distributed across segments SO through S3 of the pipeline, with most of the work 
being done in SI. In SO, instruction stream data is fetched from the virtual instruction cache 
(VIC) using the address contained in the virtual instruction buffer address register (VLB A). The 
data is written into the prefetch queue (PFQ) and VIBA is incremented to the next location. 

In segment SI, the PFQ is read and the burst unit uses internal state and the contents of 
the IROM to select the next instruction stream component— either an opcode or specifier. This 
decoding processing is known as bursting. Some instruction components take multiple cycles to 
burst. For example, FD opcodes require two burst cycles: one for the FD byte, and one for the 
second opcode byte. Similar!}', indexed specifiers require at least two burst cycles: one for the 
index byte, and one or more for the base specifier. 

When an opcode is decoded, the information is passed to the issue unit, which consults the IROM 
for the initial Ebox control store address of the routine which will process the instruction. The 
issue unit sends the address and other instruction-related information to the instruction queue 
where it is held until the Ebox reaches the instruction. 

When a specifier is decoded, the information is passed to the source and destination queue allo- 
cation logic and, potentially, to the complex specifier pipeline. The source and destination queue 
allocation logic allocates the appropriate number of entries for the specifier in the source and 
destination queues in the Ebox. These queues contain pointers to operands and results, and are 
discussed in more detail below. 

If the specifier is not a short literal or register specifier, which are collectively known as simple 
specifiers, it is considered to be a complex specifier and is processed by the small microcode- 
controlled complex specifier unit (CSU), which is distributed in segments SI (control store access), 
S2 (operand access, including register file read), and S3 (ALU operation, Mbox request, GPR 
write) of the pipeline. The CSU pipeline computes all specifier memory addresses, and makes 
the appropriate request to the Mbox for the specifier type. Tb avoid reading or writing a GPR 
which is interlocked by a pending Ebox reference, the CSU pipeline includes a register scoreboard 
which detects data dependencies. The CSU pipeline also provides additional help to the Ebox by 
supplying operand information that is not an explicit part of the instruction stream. For example, 
the PC is supplied as an implicit operand for instructions that require it (such as BSBB). 

The branch prediction unit (BPU) watches each opcode that is decoded looking for conditional 
and unconditional branches. For unconditional branches, the BPU calculates the target PC and 
redirects PC and VIBA to the new path. For conditional branches, the BPU predicts whether 
the instruction will branch or not based on previous history. If the prediction indicates that the 
branch will be taken, PC and VIBA are redirected to the new path. The BPU writes the conditional 
branch prediction flag into the branch queue in the Ebox, to be used by the Ebox in the execution 
of the instruction. The BPU maintains enough state to restore the correct instruction PC if the 
prediction turns out to be incorrect. 

5.3.1.2 The Microsequencer 

The microsequencer operates in segment S2 of the pipeline and is responsible for supplying to 
the Ebox the next microinstruction to execute. If a macroinstruction requires the execution of 
more than one microinstruction, the microsequencer supplies each microinstruction in sequence 
based on directives included in the previous microinstruction. 
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At macroinstruction boundaries, the microsequencer removes the next entry from the instruc- 
tion queue, which includes the initial microinstruction address for the macroinstruction. If the 
instruction queue is empty, the microsequencer supplies the address of a special no-op microin- 
struction. 

The microsequencer is also responsible for evaluating all exception requests, and for providing 
a pipeline flush control signal to the Ebox. For certain exceptions and interrupts, the microse- 
quencer injects the address of a special microinstruction handler that is used to respond to the 
event. 

5.3.1.3 The Ebox 

The Ebox is responsible for executing all of the non-floating point instructions, for delivery of 
operands to and receipt of results from the Fbox, and for handling non-instruction events such as 
interrupts and exceptions. The Ebox is distributed through segments S3 (operand access, includ- 
ing register file read), S4 (ALU and shifter operation, Rmux request), and S5 (Rmux completion, 
register write, completion of Mbox request) of the pipeline. 

For the most part, instruction operands are prefetched by the Ibox, and addressed indirectly 
through the source queue. The source queue contains the operand itself for short literal specifiers, 
and a pointer to an entry in the register file for other operand types. 

An entry in the field queue is made when a field-type specifier entry is made into the source queue. 
The field queue provides microbranch conditions that allow the Ebox microcode to determine if 
a field-type specifier addresses either a GPR or memory. A microbranch on a valid field queue 
entry retires the entry from the queue. 

The register file is divided into four parts: the GPRs, memory data (MD) registers, working 
registers, and CPU state registers. For register-mode specifiers, the source queue points to the 
appropriate GPR in the register file. For other non-short literal specifier modes, the source queue 
points to an MD register. The MD register is either written directly by the Ibox, or by the Mbox 
as the result of a memory read generated by the Ibox. 

The S3 segment of the Ebox pipeline is responsible for selecting the appropriate operands for the 
Ebox and Fbox execution of instructions. Operands are selected onto E%ABUS and E%BBUS for 
use in both the Ebox and Fbox. In most instances, these operands come from the register file, 
although there are other data path sources of non-instruction operands (such as the PSL). 

Ebox computation is done by the ALU and the shifter in the S4 segment of the pipeline on 
operands supplied by the S3 segment. Control for these units is supplied by the microinstruction 
which was originally supplied to the S3 segment by the microsequencer, and then subsequently 
moved forward in the pipeline. 

The S4 segment also contains the RMUX, whose responsibility is to select results from either 
the Ebox or Fbox and perform the appropriate register or memory operation. The RMUX inputs 
come from the ALU, shifter, and P%RESULT at the end of the cycle. The RMUX actually spans the 
S4/S5 boundary such that its outputs are valid at the beginning of the S5 segment. The RMUX 
is controlled by the retire queue, which specifies the source (either Ebox or Fbox) of the result 
to be processed (or retired) next. Non-selected RMUX sources are delayed until the retire queue 
indicates that they should be processed. 
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As the source queue points to instruction operands, so the destination queue points to the des- 
tination for instruction results. If the result is to be stored in a GPR, the destination queue 
contains a pointer to the appropriate GPR. If the result is to be stored in memory, the destination 
queue indicates that a request is to be made to the Mbox, which contains the physical address of 
the result in the PA queue (which is described below). This information is supplied as a control 
input to the RMUX logic. 

Once the RMUX selects the appropriate source of result information, it either requests Mbox 
service, or sends the result onto E%WBUS to be written back to the register file or to other data 
path registers in the S5 segment of the pipeline. The interface between the Ebox and Mbox for 
all memory requests is the EM_LATCH, which contains control information and may contain an 
address, data, or both, depending on the type of request. In addition to operands and results that 
are prefetched by the Ibox, the Ebox can also make explicit memory requests to the Mbox to read 
or write data. 

5.3.1.4 The Fbox 

The Fbox is responsible for executing all of the floating point instructions in the "VAX base in- 
struction group, as well as the longword-length integer multiply instructions. 

For each instruction that the Fbox is to execute, it receives from the microsequencer the opcode 
and other instruction-related information. The Fbox receives operand data from the Ebox on 
E%ABUS and E%BBUS. 

Execution of instructions is performed in a dedicated Fbox pipeline that appears in segment S4 
of Figure 5—8, but is actually a minimum of three cycles in length. Certain instructions, such 
as integer multiply, may require multiple passes through some segments of the Fbox pipeline. 
Other instructions, such as divide, are not pipelined at all. 

Fbox results and status are returned via F%RESULT to the RMUX in the Ebox for retirement. 
When the instruction is next to retire, the RMUX hardware, as directed by the destination 
queue, sends the results to either the GPRs for register destinations, or to the Mbox for memory 
destinations. 

5.3.1.5 The Mbox 

The Mbox operates in the S5 and S6 segments of the pipeline, and is responsible for all memory 
references initiated by the other sections of the chip. Mbox requests can come from the Ibox 
(for VIC fills and for specifier references), the Ebox or Flaox via the RMUX and the EM_LATCH 
(for instruction result stores and for explicit Ebox memory requests), from the Mbox itself (for 
translation buffer fills and PTE reads), and from the Cbox (for invalidates and cache fills). 

All virtual references are translated to a physical address by the translation buffer (TB), which 
operates in the S5 segment of the pipeline. For instruction result references generated by the 
Ibox, the translated address is stored in the physical address queue (PA queue). These addresses 
are later matched with data from the Ebox or Fbox, when the result is calculated. 

For memory references, the physical address from either the TB or the PA queue is used to 
address the primary cache (Pcache) starting in the S5 segment of the pipeline and continuing 
into the S6 segment. Read data is available in the middle of the S6 segment, right-justified and 
returned to the requester on M%MDJBUS by the end of the cycle. Writes are also completed by 
the end of the cycle. Although the Pcache access spans the S5 and S6 segments of the pipeline, 
a new access can be started each cycle in the absence of a TB or cache miss. 
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5.3.1 .6 The Cbox 

The Cbox is responsible for accessing the backup cache CBcache), and for memory requests. The 
Cbox receives input from the IMtbox in the S6 segment of the pipeline, and usually takes multiple 
cycles to complete a request. For this reason, the Cbox is not shown in specific pipeline segments. 

If a memory read misses in the Pcache, the request is sent to the Cbox for processing. The Cbox 
first looks for the data in the Bcache and fills the Pcache from the Bcache if the data is present. 
If the data is not present in the Bcache, the Cbox requests a cache fill from the system. When 
the system returns the data, it is written to the Pcache (and potentially to the VIC). Although 
Pcache fills are done by making a request to the Mbox pipeline, data is returned to the original 
requester as quickly as possible by driving data directly onto B%S6_DA!TA, and from there onto 
M%MD_BUS as soon as the bus is free. 

Because the Pcache operates as a write-through cache, all memory writes are passed to the Cbox. 
To avoid multiple writes to the same Bcache block, the Cbox contains a write buffer in which 
multiple writes to the same quadwords are packed. If possible two quadwords (an octaword) are 
assembled together before the Bcache is actually written. 

5.3.2 Stalls in the Pipeline 

Despite our best attempts at keeping the pipeline flowing smoothly, there are conditions which 
cause segments of the pipeline to stall. Conceptually, each segment of the pipeline can be consid- 
ered as a black box which perf orms three steps every cycle: 

1. The task appropriate to the pipeline segment is performed, using control and inputs from the 
previous pipeline segment. The segment then updates local state (within the segment), but 
not global state (outside of the segment). 

2. Just before the end of the cycle, all segments send stall conditions to the appropriate state 
sequencer for that segment, which evaluates the conditions and determines which, if any, 
pipeline segments must stall. 

3. If no stall conditions exist for a pipeline segment, the state sequencer allows it to pass results 
to the next segment and accept results from the previous segment. This is accomplished by 
updating global state. 

This sequence of steps maximizes throughput by allowing each pipeline segment to assume that 
a stall will not occur (which should be the common case). If a stall does occur at the end of 
the cycle, global state updates are blocked, and the stalled segment repeats the same task (with 
potentially different inputs) in the next cycle (and the next, and the next) until the stall condition 
is removed. 

This description is over-simplified in some cases because some global state must be updated by a 
segment before the stall condition is known. Also, some tasks must be performed by a segment 
once and only once. These are treated specially on a case-by-case basis in each segment. 

Within a particular section of the chip, a stall in one pipeline segment also causes stalls in all 
upstream segments (those that occur earlier in the pipeline) of the pipeline. Unlike Rigel, stalls 
in one segment of the pipeline do not cause stalls in downstream segments of the pipeline. For 
example, a memory data stall in Rigel also caused a stall of the downstream ALU segment. In 
NVAX Plus, a memory data stall does not stall the ALU segment (a no-op is inserted into the S4 
segment when S4 advances to S5). 
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There are a number of stall conditions in the chip which result in a pipeline stall. Each is 
discussed briefly below and in much more detail in the appropriate chapter of this specification. 

5.3.2.1 SO Stalls 

Stalls that occur in the SO segment of the pipeline are as follows: 
Ibox: 

• PFQ full: In normal operation, the VIC is accessed using the address in "VTBA, the data is 
sent to the prefetch queue, and VTBA is incremented. If the PFQ is full, the increment of 
VIBA is blocked, and the data is re-referenced in the VIC until there is room for it in the 
PFQ. At that point, prefetch resumes. 

5.3.2.2 S1 Stalls 

Stalls that occur in the SI segment of the pipeline are as follows: 
Ibox: 

• Insufficient PFQ data: The burst unit attempts to decode the next instruction component 
each cycle. If there are insufficient PFQ bytes valid to decode the entire component, the burst 
unit stalls until the required bytes are delivered from the VIC. 

• Source queue or destination queue full: During specifier decoding, the source and destination 
queue allocation logic must allocate enough entries in each queue to satisfy the requirements 
of the specifier being parsed. To guarantee that there will be sufficient resources available, 
there must be at least 2 free source queue entries and 2 free destination queue entries to 
complete the burst of the specifier. If there are insufficient free entries in either queue,the 
burst unit stalls until free entries become available. 

• MD file full: When a complex specifier is decoded, the source queue allocation logic must 
allocate enough memory data registers in the register file to satisfy the requirements of the 
specifier being parsed. To guarantee that there will be sufficient resources available, there 
must be at least 2 free memory data registers available to complete the burst of the specifier. 
If there are insufficient free registers, the burst unit stalls until enough memory data registers 
becomes available. 

• Second conditional branch decoded: The branch prediction unit predicts the path that each 
conditional branch will take and redirects the instruction stream based on that prediction. It 
retains sufficient state to restore the alternate path if the prediction was wrong. If a second 
conditional branch is decoded before the first is resolved by the Ebox, the branch prediction 
unit has nowhere to store the state, so the burst unit stalls until the Ebox resolves the actual 
direction of the first branch. 

• Instruction queue full: When a new opcode is decoded by the burst unit, the issue unit 
attempts to add an entry for the instruction to the instruction queue. If there are no free 
entries in the instruction queue, the burst unit stalls until a free entry becomes available, 
which occurs when an instruction is retired through the RMUX. 

• Complex specifier unit busy: If the burst unit decodes an instruction component that must 
be processed by the CSU pipeline, it makes a request for service by the CSU through an SI 
request latch. If this latch is still valid from a previous request for service (either due to a 
multi -cycle flow or a CSU stall), the burst unit stalls until the valid bit in the request latch 
is cleared. 
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Immediate data length not available: The length of the specifier extension for immediate 
specifiers is dependent on the data length of the specifier for that specific instruction. The 
data length information comes from one of the Ibox instruction PLAs which is accessed based 
on the opcode of the instruction. If the PLA acqess is not complete before an immediate 
specifier is decoded (which would have to be the first specifier of the instruction), the burst 
unit stalls for one cycle. 



5.3.2.3 S2 Stalls 

Stalls that occur in the S2 segment of the pipeline are as follows: 
Ibox: 

• Outstanding Ebox or Fbox GPR write: In order to calculate certain specifier memory ad- 
dresses, the CSU must read the contents of a GPR from the register file. If there is a pending 
Ebox or Fbox write to the register, the Ibox GPR scoreboard prevents the GPR read by stalling 
the S2 segment of the CSU pipeline. The stall continues until the GPR write completes. 

• Memory data not valid: For certain operations, the Ibox makes an Mbox request to return 
data which is used to complete the operation (e.g., the read done for the indirect address of a 
displacement deferred specifier). The Ibox MD register contains a valid bit which is cleared 
when a request is made, and set when data returns in response to the request. If the Ibox 
references the Ibox MD register when the valid bit is off, the S2 segment of the CSU pipeline 
stalls until the data is returned by the Mbox. 

Microsequencer: 

• Instruction queue empty: The final microinstruction of a macroinstruction execution flow in 
the Ebox is indicated when a SEQJVEUX/LAST. CYCLE* microinstruction is decoded by the mi- 
crosequencer. In response to this event, the Ebox expects to receive the first microinstruction 
of the next macroinstruction flow based on the initial address in the instruction queue. If the 
instruction queue is empty, the Microsequencer supplies the instruction queue stall microin- 
struction in place of the next macroinstruction flow. In effect, this stalls the microsequencer 
for one cycle. 



5.3.2.4 S3 Stalls 

Stalls that occur in the S3 segment of the pipeline are as follows: 
Ibox: 

• Outstanding Ebox GPR read: In order to complete the processing for auto-increment, auto- 
decrement, and auto-increment deferred specifiers, the CSU must update the GPR with the 
new value. If there is a pending Ebox read to the register through the source queue, the Ibox 
scoreboard prevents the GPR write by stalling the S3 segment of the CSU pipeline. The stall 
continues until the Ebox reads the GPR. 

• Specifier queue full: For most complex specifiers, the CSU makes a request for Mbox service 
for the memory request required by the specifier. If there are no free entries in the specifier 
queue, the S3 segment of the CSU pipeline stalls until a free entry becomes available. 
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• RLOG full: Auto-increment, auto-decrement, and auto-increment deferred specifiers require 
a free RLOG entry in which to log the change to the GPR. If there are no free RLOG entries 
when such a specifier is decoded, the S3 segment of the CSU pipeline stalls until a free entry 
becomes available. 

Ebox: 

• Memory read data not valid: In some instances, the Ebox may make an explicit read request 
to the Mbox to return data in one of the 6 Ebox working registers in the register file. When 
the request is made, the valid bit on the register is cleared. When the data is written to the 
register, the valid bit is set. If the Ebox references the working register when the valid bit is 
clear, the S3 segment of the Ebox pipeline stalls until the entry becomes valid. 

• Field queue not valid: For each macroinstruction that includes a field-type specifier, the 
microcode microbranches on the first entry in the field queue to determine whether the field 
specifier addresses a GPR or memory. If the field queue is empty (indicating that the Ibox 
has not yet parsed the field specifier), the result of the next address calculation repeats the 
microbranch the next cycle. Although this is not a true stall, the effects are the same in that 
a microinstruction is repeated until the field queue becomes valid. 

• Outstanding Fbox GPR write: Because the Fbox computation pipeline is multiple cycles long, 
the Ebox .may start to process subsequent instructions before the Fbox completes the first. 
If the Fbox instruction result is destined for a GPR that is referenced by a subsequent Ebox 
microword, the S3 segment of the Ebox pipeline stalls until the Fbox GPR write occurs. 

• Fbox instruction queue full: When an instruction is issued to the Fbox, an entry is added to 
the Fbox instruction queue. If there are no free entries in the queue, the S3 segment of the 
Ebox pipeline stalls until a free entry becomes available. 

Ebox/Fbox: 

• Source queue empty: Most instruction operands are prefetched by the Ibox, which writes 
a pointer to the operand value into the source queue. The Ebox then references up to two 
operands per cycle indirectly through the source queue for delivery to the Ebox or Fbox. If 
either of the source queue entries referenced is not valid, the S3 segment of the Ebox pipeline 
stalls until the entry becomes valid. 

• Memory operand not valid: Memory operands are prefetched by the Ibox, and the data is 
written by the either the Mbox or Ibox into the memory data registers in the register file. If 
a referenced source queue entry points to a memory data register which is not valid, the S3 
segment of the Ebox pipeline stalls until the entry becomes valid. 

5.3.2.5 S4 Stalls 

Stalls that occur in the S4 segment of the pipeline are as follows: 
Ebox: 

• Branch queue empty: When a conditional or unconditional branch is decoded by the Ibox, an 
entry is added to the branch queue. For conditional branch instructions, the entry indicates 
the Ibox prediction of the branch direction. The branch queue is referenced by the Ebox to 
verify that the branch displacement was valid, and to compare the actual branch direction 
with the prediction. If the branch queue entry has not yet been made by the Ibox, the S4 
segment of the Ebox pipeline stalls until the entry is made. 
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• Fbox GPR operand scoreboard full: The Ebox implements a register scoreboard to prevent 
the Ebox from reading a GPR to which there is an outstanding write by the Fbox. For each 
Fbox instruction which will write a GPR result, the Ebox adds an entry to the Fbox GPR 
scoreboard. If the scoreboard is full when the Ebox attempts to add an entry, the S4 segment 
of the Ebox pipeline stalls until a free entry becomes available. 

Fbox: 

• Fbox operand not valid: Instructions are issued to the Fbox when the opcode is removed 
from the instruction queue by the microsequencer. Operands for the instruction may not 
arrive until some time later. If the Fbox attempts to start the instruction execution when the 
operands are not yet valid, the Fbox pipeline stalls until the operands become valid. 

Ebox/Fbox: 

• Destination queue empty: Destination specifiers for instructions are processed by the Ibox, 
which writes a pointer to the destination (either GPR or memory) into the destination queue. 
The destination queue is referenced in two cases: when the Ebox or Fbox store instruction 
results via the RMUX, and when the Ebox tries to add the destination of Fbox instructions to 
the Ebox GPR scoreboard. If the destination queue entry is not valid (as would be the case if 
the Ibox has not completed processing the destination specifier), a stall occurs until the entry 
becomes valid. 

• PA queue empty: For memory destination specifiers, the Ibox sends the virtual address of the 
destination to the Mbox, which translates it and adds the physical address to the PA queue. 
If the destination queue indicates that an instruction result is in memory, a store request is 
made to the Mbox which supplies the data for the result. The Mbox matches the data with 
the first address in the PA queue and performs the write. If the PA queue is not valid when 
the Ebox or Fbox has a memory result ready, the RMUX stalls until the entry becomes valid. 
As a result, the source of the RMUX input (Ebox or Fbox) also stalls. 

• EM_LATCH full: All implicit and explicit memory requests made by the Ebox or Fbox pass 
through the EMJLATCH to the Mbox. If the Mbox is still processing the previous request 
when a new request is made, the RMUX stalls until the previous request is completed. As a 
result, the source of the RMUX input (Ebox or Fbox) also stalls. 

• RMUX selected to other source: Macroinstructions must be completed in the order in which 
they appear in the instruction stream. The Ebox retire queue determines whether the next 
instruction to complete comes from the Ebox or the Fbox. If the next instruction should come 
from one source and the other makes an RMUX request, the other source stalls until the 
retire queue indicates that the next instruction should come from that source. 



5.3.3 Exception Handling 

A pipeline exception occurs when a segment of the pipeline detects an event which requires that 
the normal flow of the pipeline be stopped in favor of another flow. There are two fundamental 
types of pipeline exceptions: those that resume the original pipeline flow once the exception is 
corrected, and those that require the intervention of the operating system. A TB miss on a 
memory reference is an example of the first type, and an access control violation is an example 
of the second type. M=0 faults are handled specially, as described below. 
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Restartable exceptions are handled entirely within the confines of the section that detected the 
event. Other exceptions must be reported to the Ebox for processing. Because the NVAX Plus 
CPU is macropipelined, exceptions can be detected by sections of the pipeline long before the 
instruction which caused the exception is actually executed by the Ebox or Fbox. However, the 
reporting of the exception is deferred until the instruction is executed by the Ebox or Fbox. At 
that point, an Ebox handler is invoked to process the event. 

Because the Ebox and Fbox are micropipelined, the point at which an exception handler is in- 
voked must be carefully controlled. For example, three macroinstructions may be in execution in 
segments S3, S4, and S5 of the Ebox pipeline. If an exception is reported for the macroinstruction 
in the S3 segment, the two macroinstructions that are in the S4 and S5 segments must be allowed 
to complete before the exception handler is invoked. 

lb accomplish this, the S4/S5 boundary in the Ebox is defined to be the commit point for a 
microinstruction. Architectural state is not modified before the S5 segment of the pipeline, unless 
there is some mechanism for restoring the original state if an exception is detected (the Ibox RLOG 
is an example of such a mechanism). Exception reporting is deferred until the microinstruction 
to which the event belongs attempts to cross the S4/S5 boundary. At that point, the exception 
is reported and an exception handler is invoked. By deferring exception reporting to this point, 
the previous microinstruction (which may belong to the previous macroinstruction) is allowed to 
complete. 

Most exceptions are reported by requesting a microtrap from the Microsequencer. When the 
Microsequencer receives a microtrap request, it causes the Ebox to break all its stalls, aborts the 
Ebox pipeline (by asserting E_USQ%PE_ABORT), and injects the address of a handler for the event 
into the control store address latch. This starts an Ebox microcode routine which will process the 
exception as appropriate. Certain other kinds of exceptions are reported by simply injecting the 
appropriate handler address into the control store at the appropriate point. 

The VAX architecture categorizes exceptions into two types: faults and traps. For both types, the 
microcode handler for the exception causes the Ibox to back out all GPR modifications that are 
in the RLOG, and retrieves the PC from the PC queue. For faults, the PC returned is the PC of 
the opcode of the instruction which caused the exception. For traps, the PC returned is the PC 
of the opcode of the next instruction to execute. The microcode then constructs the appropriate 
exception frame on the stack, and dispatches to the operating system through the appropriate 
SCB vector. 

There are a number of exceptions detected by the NVAX Plus CPU pipeline, each of which is 
discussed briefly below, and in much more detail in the appropriate chapter of this specification. 

5.3.3.1 Interrupts 

The CPU services interrupt requests from various sources between macroinstructions, and at 
selected points within the string instructions. Interrupt requests are received by the interrupt 
section and compared with the current IPL in the PSL. If the interrupt request is for an IPL 
that is higher than the current value in the PSL, a request is posted to the microsequencer. At 
the next macroinstruction boundary, the microsequencer substitutes the address of the microcode 
interrupt service routine for the instruction execution flow. 

The microcode handler then determines if there is actually an interrupt pending. If there is, it 
is dispatched to the operating system through the appropriate SCB vector. 
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5.3.3.2 Integer Arithmetic Exceptions 

There are three integer arithmetic exceptions detected by the CPU, all of which are categorized 
as traps by the VAX architecture. This is significant because the event is not reported until after 
the commit point of the instruction, which allows that instruction to complete. 

Integer Overflow Trap 

An integer overflow is detected by the RMUX at the end of the S4 segment of the Ebox 
pipeline. If PSL<IV> is set and overflow traps are enabled by the microcode, the event is 
reported in segment S5 of the pipeline via a microtrap request. 

Integer Divide-By-Zero Trap 

An integer divide-by-zero is detected by the Ebox microcode routine for the instruction. It 
is reported by explicitly retiring the instruction and then jumping directly to the microcode 
handler for the event. 

Subscript Range Trap 

A subscript range trap is detected by the Ebox microcode routine for the INDEX instruction. 
It is reported by explicitly retiring the instruction and then jumping directly to the microcode 
handler for the event. 



5.3.3.3 Floating Point Arithmetic Exceptions 

All floating point arithmetic exceptions are detected by the Fbox pipeline during the execution of 
the instruction. The event is reported by the RMUX when it selects the Fbox as the source of the 
next instruction to process. At that point, a microtrap is requested. 

5.3.3.4 Memory Management Exceptions 

Memory management exceptions are detected by the Mbox when it processes a virtual read or 
write. This section covers actual memory management exceptions such as access control violation, 
translation not valid, and M=0 faults. Translation buffer misses are discussed separately in the 
next section. Because the reporting of memory management exceptions is specific to the operation 
that caused the exception, each case is discussed separately. 

• I-Stream Faults 

While the Ibox is decoding instructions, it may access a page which is not accessible due 
to a memory management exception. This may occur on the opcode, a specifier or specifier 
extension, or on a branch displacement. Should this occur, the Ibox sets a global MME 
fault flag and stops. Memory management exceptions detected on intermediate operations 
during specifier evaluation (such as a read for the indirect address of a displacement deferred 
specifier) are converted by the Ibox into source or destination faults, as described below. 

If the Ebox reaches the instruction which caused the exception (which may not happen due to, 
for example, interrupt, exception, or branch), it will reference one of the queues, which does 
not have a valid entry because the Ibox stopped when the error was detected. The particular 
queue depends on the instruction component on which the error was detected. If the Ibox 
global MME flag is set when an empty queue entry is referenced, the error is reported in one 
of four ways. 
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If the Ibox global MME flag is set when the microsequencer references an invalid instruction 
queue entry, it inserts the instruction queue stall into the pipeline and the Ebox qualifies it 
with the fault flag. When this nag reaches the S4 segment of the pipeline and is selected by 
the RMUX, a microtrap is requested. 

If the Ibox global MME flag is set when the Ebox references an invalid source queue entry, 
a fault flag is injected into either the Ebox or Fbox pipelines, depending on the type of in- 
struction. To avoid a deadlock, S3 stalls do not prevent forward prgress of the flag in the 
pipeline. When the flag reaches the S4 segment of the pipeline and is selected by the RMUX, 
a microtrap is requested. 

If the Ibox global MME flag is set when the Ebox microcode microbranches on an invalid field 
queue entry, a fault flag is injected into the Ebox pipeline. When the flag reaches the S4 
segment of the pipeline and is selected by the RMUX, a microtrap is requested. 

If the Ibox global MME flag is set when the Ebox references an invalid branch queue entry, 
and the RMUX selects the Ebox, a microtrap is requested. 

If the Ibox global MME flag is set when the RMUX references an invalid destination queue 
entry for a store request, a microtrap is requested. 

• Source Operand Faults 

If the Mbox detects a memory management exception during the translation for a source 
specifier, it qualifies the data returned to the MD file with a fault flag which is written into 
the MD file. When this entry is referenced by the Ebox, a fault flag is injected into the 
pipeline. To avoid a deadlock, S3 stalls do not prevent forward prgress of the flag in the 
pipeline. When the flag reaches theS4 segment of the pipeline and is selected by the RMUX, 
a microtrap is requested. 
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• Destination Address Faults 

If the Mbox detects a memory management exception during the translation for a destination 
specifier, it sets a fault nag in the PA queue entry for the address. When this entry is 
referenced by the RMUX, a microtrap is requested,. 

• Faults on Explicit Ebox Memory Requests 

Explicit Ebox reads and writes are, by definition, performed in the context of the instruction 
which the Ebox is currently executing. If the Mbox detects a memory management exception 
that was the result of an explicit Ebox read or write, it requests an immediate microtrap to 
the memory management fault handler. 

• M=0 faults 

M=0 faults occur when the Mbox finds the M-bit clear in the PTE which is used to translate 
write- type references. The event is reported to the Ebox in one of the three ways described 
above: via the MD file or PA queue fault flags, or via an immediate microtrap for explicit 
Ebox writes. 

Unlike other memory management exceptions, which are dispatched to the operating system, 
M=0 faults are completely processed by the Ebox microcode handler. For normal instructions, 
the handler causes the Ibox to back out all GPR modifications that are in the RLOG and 
retrieves the PC from the PC queue. For string instructions, any RLOG entries that belong 
to the string instructions are not processed, and PSL<FPD> is set. Using the PTE address 
supplied by the Mbox, the Ebox microcode reads the PTE, sets the M-bit, and writes the 
PTE back to memory. The instruction stream is then restarted at the interrupted instruction 
(which may result in special FPD handling, as described below). 



5.3.3.5 Translation Buffer Miss 

Translation buffer misses are liandled by the Mbox transparently to the rest of the CPU. When 
a reference misses in the translation buffer, the Mbox aborts the current reference and invokes 
the services of the memory management exception sequencer in the Mbox, which fetches the 
appropriate PTE from memory and loads it into the translation buffer. The original reference is 
then restarted. 

5.3.3.6 Reserved Addressing Mode Faults 

Reserved addressing mode faults are detected by the Ibox for certain illegal combinations of 
specifier addressing modes and registers. When one of these combinations is detected, the Ibox 
sets a global addressing mode fault flag that indicates that the condition was detected and stops. 

If the Ibox global addressing mode fault flag is set when the Ebox references an invalid source 
queue entry, a fault flag is injected into either the Ebox or Fbox pipelines, depending on the type 
of instruction. To avoid a deadlock, S3 stalls do not prevent forward prgress of the flag in the 
pipeline. The fault flag is carried along the Ebox or Fbox pipeline and passed to the RMUX, 
which reports the event by requesting a microtrap when that source is selected. 
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If the Ibox global addressing mode fault flag is set when the Ebox microcode microbranches on 
an invalid field queue entry, a fault flag is injected into the Ebox pipeline. When the flag reaches 
the S4 segment of the pipeline and is selected by the RMUX, a microtrap is requested. 

Similarly, if the Ibox global addressing mode fault flag is set when the RMUX, in response to 
a request by the Ebox or Fbox, references an invalid destination queue entry, a microtrap is 
requested. 

5.3.3.7 Reserved Operand Faults 

Reserved operand faults for floating point operands are detected by the Fbox, and reported in the 
same manner as the floating point arithmetic exceptions described above. 

Other reserved operand faults are detected by Ebox microcode as part of macroinstruction exe- 
cution flows and are reported by jumping directly to the fault handler. 

5.3.3.8 Exceptions Occurring as the Consequence of an Instruction 

Opcode-specific exceptions such as reserved instruction faults, breakpoint faults, etc., are dis- 
patched directly to handlers by placing the address of the handler in the instruction PLA for each 
instruction. 

Other instruction-related faults, such as privileged instruction faults, are detected in execution 
flows by the Ebox microcode and are reported by jumping directly to the fault handler. 

For testability, the Fbox may be disabled. If this is the case, integer multiply instructions are exe- 
cuted by the Ebox microcode and floating point instructions are converted into reserved instruction 
faults for emulation by software. When the first Ebox microinstruction of an Fbox operand flow 
for a floating point macroinstruction reaches the S4 segment of the pipeline, a microtrap is re- 
quested. The handler for this microtrap then jumps directly to the reserved instruction fault 
handler. 

5.3.3.9 Trace Fault 

Trace faults are detected by the microsequencer with some help from the Ebox. The microse- 
quencer maintains a duplicate copy of PSL<TP>, which it updates as required to track the state 
of the PSL copy as it would exist when the instruction is executed by the Ebox. At the end of a 
macroinstruction, the microsequencer logically ORs its local copy of the TP bit with PSL<TP>. If 
either is set, the microsequencer substitutes the address of the microcode trace fault handler for 
the address of the next macroinstruction. 

5.3.3.10 Conditional Branch Mispredict 

When the Ibox decodes a conditional branch, it predicts the path that the branch will take and 
places its prediction into the branch queue. When the Ebox reaches the instruction, it evaluates 
the actual path that the branch took and compares it in the S5 segment of the Ebox pipeline with 
the Ibox prediction. If the two are different, the Ibox is notified that the branch was mispredicted 
and a microtrap request is made to abort the Ebox and Fbox pipelines. The Ibox flushes itself, 
backs out any GPR modifications that are in the RLOG, and redirects the instruction stream to 
the alternate path. The Ebox microcode handler for this event cleans up certain machine state 
and waits for the first instruction from the alternate path. 
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5.3.3.1-1 First Part Done Handling 

During the execution of one of the 8 string instructions that are implemented by the CPU, an 
exception or an interrupt may be detected. In that event, the Ebox microcode saves all state 
necessary to resume the instruction in the GPRs, backs up PC to point to the opcode of the string 
instruction, sets PSL<FPD> in the saved PSL, and dispatches to the handler for the interrupt or 
exception. 

When the interrupt or exception is resolved, the software handler terminates with an REI back to 
the instruction. When the Ibox decodes an instruction with PSL<FPD> set, it stops parsing the 
instruction immediately after the opcode. In particular, it does not parse the specifiers. When the 
microsequencer finds PSL<FPD> set at a macroinstruction boundary, it substitutes the address 
of a special FPD handler for the instruction execution now. 

The FPD handler determines which instruction is being resumed from the opcode, unpacks the 
state saved in the GPRs, clears PSL<FPD>, advances PC to the end of the string instruction (by 
adding the opcode PC to the length of the instruction, which was part of the saved state), and 
jumps back to the middle of the interrupted instruction. 

5.3.3.12 Cache and Memory Hardware Errors 

Cache and memory hardware errors are detected by the Mbox or Cbox, depending on the type 
of error. If the error is recoverable (e.g., a Pcache tag parity error on a write simply disables 
the Pcache), it is reported via a soft error interrupt request and is dispatched to the operating 
system. 

In some instances, write errors that are not recoverable by hardware are reported via a hard 
error interrupt request, which results in the invocation of the operating system. 

Read errors that are not recoverable by hardware are reported via the assertion of a soft error 
interrupt, and also in a manner that is similar to that used for memory management exceptions, 
as described above. In fact, the MD file, PA queue, and the Ibox all contain a hardware error flag 
in parallel with the memory management fault flag. With the exception of TB parity errors, which 
cause an immediate microtrap request, the event is reported to the Ebox in exactly the same way 
as the equivalent memory management exception would be, but the microcode exception handler 
is different. For example, an unrecoverable error on a specifier read would set the hardware error 
flag in the MD file. When the flag is referenced, the error flag is injected into the pipeline. When 
the flag advances to the S4 segment and is selected by the RMUX, it causes a microtrap request 
which invokes a hardware error handler rather than a memory management handler. 

Note that certain other errors are reported in the same way. For example, if the memory man- 
agement sequencer in the Mbox receives an unrecoverable error trying to read a PTE necessary 
to translate a destination specifier, it sets the hardware error flag in the PA queue for the entry 
corresponding to the specifier. "This results in a microtrap to the hardware error handler when 
the entry is referenced. PTE read errors for read references are also reported via the original 
reference. 
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5.4 Revision History 



Table 5-1 : 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


06-Mar-1989 


Release for external review. 


Gil Wolrich 


15-Nov-1990 


Update for NVAX Plus external release. 
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Chapter 6 

Microinstruction Formats 



6.1 Ebox Microcode 

The NVAX Plus microword consists of 61 bits divided into two major sections. Bits <60:15> control 
the Ebox Data Path and are encoded into two formats. Bits <14:0> control the Microsequencer 
and are also encoded into two formats. 



6.1.1 Data Path Control 

The Data Path Control Microword specifies all the information needed to control the Ebox Data 
Path. The two formats, Standard and Special, are selected by bit <60>, the FORMAT bit. In 
addition, bit <45>, the LIT bit,, selects the constant generation format of the microword, which 
may be either an 8-bit constant or a 10-bit constant, depending on a decode in the MISC field. 
Pictures of the microword formats are in Figure 6-1 and Figure 6-2. A brief description of each 
field is given in Table 6—1 and Table 6—2. 

Figure 6-1 : Ebox Data Path Control, Standard Format 



6 | £ 5 5 5 1 5 5 5 5 1 5 5 4 4 I 4 4 4 4 1 4 4 4 4 1 3 3 3 3 1 3 3 3 3 1 3 3 2 2 1 2 2 2 2I2 2 2 2 1 1 1 a 111 

0|6 8 7 615 4 3 211 0 S 8|7 6 5 4|3 2 1 0 1 S> 8 7 6 1 5 4 3 211 0 S> 8|7 6 £ 4 13 2 1 0|S 8 7 6 1 5 

— — r _ — ______ + - T _----4— 4._4.____=._. — 

[0] AID | MRQ 1 0 1 SHF 1 01 VAX I B |L|W|V| DST I A | MISC I 

i 1 IPOS CONST | MISC not equal CONST. 10 

+-+ + -+ 

111 CONST. 10 | MISC equal CONST. 10 ■ 



Table 6-1 : EBOX Data Path Control Microword Fields, Standard Format 

Microword 

Bit Position Microword Field Format Description 

60 FORMAT — Microword format-Standard or Special 

59:55 ALU Both ALU function select 
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Table 6-1 (Cont.): EBOX Data Path Control Mlcroword Fields, Standard Format 



Bit Position 


Microword Field 


Microword 
Format 


Description 


54:50 


MRQ 


Both 


Mbox request select 


49 


Q 


Standard 


Q register load control 


48:46 


SHF 


Standard 


Shifter function select 


45 


LIT 


Both 


ALU/shifter B port control-register or literal 


44:40 


VAL 


Standard 1 


Constant shift amount 


39:35 


B 


Both 1 


ALU/shifter B port select 


44:43 


POS 


Both 2 


Constant position 


42:35 


CONST 


Both 2 


8-bit constant value 


44:35 


CONST. 10 


Both 3 


10-bit constant value 


34 


L 


Both 


Length control 


33 


W 


Both 


Wbus driver control 


32 


V 


Both 


VA write enable 


31:26 


DST 


Both 


WBUS destination select 


25:20 


A 


Both 


ALU/shifter A port Belect 


19:15 


MISC 


Both 


Miscellaneous function select, group 0 



1 NOT Constant generation microword variant 

2 S-Bit Constant generation microword variant, when MISC field not equal CONST. 10 
3 10-Bit Constant generation microword variant, when MISC field equal CONST.10 



Figure 6-2: Ebox Data Path Control, Special Format 
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III CONST.10 I MISC equal CONST. 10 



Table 6-2: EBOX Data Path Control Microword Fields, Special Format 



Bit Position Microword Field 



Microword 

Format Description 



60 



FORMAT 



Microword format-Standard or Special 
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Table 6-2 (Cont.): EBOX Data Path Control Microword Fields, Special Format 



Bit Position 


Microword Field 


Format 


Description 


rn,c c 

oy:oo 


A T TT 
ALrU 


Both 


Al/U function select 


■54:50 


Mrtv>» 


Both 


Mbox request select 


49:46 


M1SC1 


Special 


Miscellaneous function select, group 1 


A C 

45 


T TT 


Both 


ALU/shifter B port control— register or literal 


44:41 


"IV/rTO/*">0 

MlbL-2 


bpecial 


Miscellaneous function select, group 2 


A f\ 

40 


JJIoArSLiiL.Kiii 1 iKii, 


Special 1 


Instruction retire disable 


39:35 


B 


Both 


A T TT/ T *£i TD — - j *l _ l 

AlAJ/shifber B port select 


44:43 


rUo 


Both 


Constant position 


42:35 


CONST 


Both'' 


8-bit constant value 


44:35 


CONST. 10 


Both 3 


10-bit constant value 


34 


L 


Both 


Length control 


33 


W 


Both 


Wbus driver control 


32 


V 


Both 


VA write enable 


31:26 


DST 


Both 


WBUS destination select 


25:20 


A 


Both 


ALU/shifter A port select 


19:15 


MISC 


Both 


Miscellaneous function select, group 0 



*NOT Constant generation microword- variant 

2 8-Bit Constant generation microword variant, when MISC field not equal CONST. 10 
3 10-Bit Constant generation microword variant, when MISC field equal CONST.10 



6.1.2 Microsequencer Control 

The Microsequencer Control Microword supplies the information necessary for the Microsequencer 
to calculate the address of the next microinstruction. The basic computation done by the 
Microsequencer involves selecting a base address from one of several sources, and then optionally 
modifying three bits of the base address to get the final next address. 

Bit <14>, SEQ.FMT, selects between Jump and Branch formats. Figure 6-3 and Figure 6-4 show 
the two formats. Table 6—3 and Table 6-^£ describe each of the fields. 
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I 0 I S IMUX ! J 



Table 6-3: Ebox Microsequencer Control Microword Fields, Jump Format 



Bit Position Microword Field 


Microword 
Format 


Description 


14 SEQ.FMT 




Microsequencer formatr-Jump or Branch 


13 SEQ.CALL 


Both 


Subroutine call 


12:11 SEQ.MUX 


Jump 


Next address Belect 


10:0 J 


Jump 


Next address 


Figure 6-4: Ebox Microsequencer Control, Branch Format 


iiiui i i 

4 3211 C & BP 6 54122 10 






11 IS ISEQ.CONt* ! BR. OFF 1 






Table 6-4: Ebox Microsequencer Control Microword Fields, Branch Format 


Bit Position Microword Field 


Microword 
Format 


Description 


14 SEQ.FMT 




Microsequencer format— Jump or Branch 


13 SEQ.CALL 


Both 


Subroutine call 


12:8 SEQ.COND 


Branch 


Microbranch condition select 


7:0 BR. OFF 


Branch 


Page offset of next address 



6.2 Ibox CSU Microcode 

The Ibox complex specifier unit is controlled by a 29-bit microword, as shown in Figure 6-5. A 
brief description of each field is given in Table 6-5. 
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Table 6-5: Ibox CSU Microword Fields 


Bit Position 


Microword Field 


Description 


28:26 


ALU 


ALU function select 


25 


DL 


Data length control 


24:22 


A 


ALU A port select 


21:19 


B 


ALU B port select 


18:16 


DST 


Wbus destination 


15:13 


MISC 


Miscellaneous function select 


12:9 


MREQ 


Mbox request select 


8:7 


MUX.CNT 


Next address mux select 


6:0 


NXT 


Next address 


6.3 Revision History 




Table 6-6: 


Revision History 




Who 


When 


Description of .change 



Debra Bernstein 06-Mar-1989 Release for external review. 



Mike Uhler 13-Dec-1989 Update for second-pass release. 
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Chapter 7 
The Ibox 



7.1 Overview 

The NVAX Plus IBOX chapter includes the overview description, IPR specifications, and description 
of IBOX testabilty features from the NVAX CPU Chip Specification. For detailed and complete IBOX 
specification refer to the NVAX CPU Chip Specification. 

7.1.1 Introduction 

This chapter describes the Ibox section of the NVAX Plus CPU chip. The 4-stage Ibox pipeline 
(SO.. S3) runs semi-autonomously to the rest of the NVAX Plus CPU and supports the following 
functions: 



• Instruction Stream Prefetching 

The Ibox attempts to maintain sufficient instruction stream data to decode the next instruc- 
tion or operand specifier. 

• Instruction Parsing 

The" Ibox identifies the instruction opcodes and operand specifiers, and extracts the informa- 
tion necessary for further processing. 

• Operand Specifier Processing 

The Ibox processes the operand specifiers, initiates the required memory references, and 
provides the Ebox with the information necessary to access the instruction's operands. 

• Branch Prediction 

Upon identification of a branch opcode, the Ibox hardware predicts the direction of the branch 
(taken vs. not taken). For branch taken predictions, the Ibox redirects the instruction 
prefetching and parsing logic to the branch destination, where instruction processing resumes. 

Figure 7-1 is a top level block diagram of the Ibox showing the major Ibox sub-sections and their 
inter-connections. 

This chapter presents a high-level description of the Ibox functions, then provides details of the 
Ibox sub-sections which support each function. 
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Figure 7-1 : Ibox Block Diagram 
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7.1.2 Functional Overview 

The Ibox fetches, parses, and processes the instruction stream, attempting to maintain a constant 
supply of parsed VAX instructions available to the Ebox for execution. The pipelined nature of the 
NVAX Plus CPU allows for multiple macroinstructions to reside within the CPU at various stages 
of execution. The Ibox, running semi -autonomously to the Ebox, parses the macroinstructions 
following the instruction that is currently in Ebox execution. Performance gains are realized 
when the time required for instruction parsing in the Ibox is hidden during the Ebox execution of 
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an earlier instruction. The Ibox places the information generated while parsing ahead into Ebox 
queues. 

The Instruction Queue contains instruction specific information which includes the instruction 
opcode, a floating point instruction flag, and an entry point for the Ebox microcode. 

The Source Queue contains information about the source operands for the instructions in the 
instruction queue. Source queue entries contain either the actual operand (as in a short literal), 
or a pointer to the location of the operand. 

The Destination Queue contains information required for the Ebox to select the location for 
execution results storage. The two possible locations are the VAX General Purpose Registers 
(GPRs) and memory. 

These queues allow the Ibox to work in parallel with the Ebox. As the Ebox consumes the entries 
in the queues, the Ibox parses ahead adding more. In the ideal case, the Ibox would stay far 
enough ahead of the Ebox such that the Ebox would never have to stall because of an empty 
queue. 

The Ibox needs access to memory for instruction and operand data. Instruction and operand data 
requests are made through a common port to the Mbox. All data for both the Ibox and the Ebox 
is returned on a shared M%MD_BUS<63:0> 

The Ibox port feeds Mbox queues to smooth memory request traffic over time, The Specifier 
Request Latch holds Ibox requests for operand data. The Instruction Request Latch holds Ibox 
requests for instruction stream data. These 2 latches allow the Ibox to issue memory requests 
for both instruction and operand data even though the Mbox may be processing other requests. 

The Ibox supports 4 main functions: 

1. Instruction Stream Prefetching 

2. Instruction Parsing 

3. Operand Specifier Processing 

4. Branch Prediction 

Instruction Stream Prefetching works to provides a steady source of instruction stream data for 
instruction parsing. While the instruction parsing logic works on one instruction, the instruction 
prefetching logic fetches several instructions ahead. 

The Instruction Parsing logic parses the incoming instruction stream, identifying and pre- 
processing each of the instruction's components. The instruction opcodes and associated informa- 
tion are passed directly into the Ebox instruction queue. Operand specifier information is passed 
on to the operand specifier processing logic. 

The Operand Specifier Processing logic locates the operands in registers, in memory, or in the 
Instruction Stream. This logic places operand information in the Ebox source and destination 
queues, and makes the required operand memory requests. 

The Ibox does not have prior knowledge of branch direction for brnaches which rely on Ebox 
condition codes. The Branch prediction logic makes a prediction on which way the branch will 
go and forces the Ibox to take that path. This logic saves the program counter of the alternate 
branch path, so that in the event that Ebox branch execution shows that the prediction was 
wrong, the Ibox can be redirected to the correct branch direction. 
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7.2 VIC Control and Error Registers 

The VIC contains 4 internal processor registers (IPRs) which provide VIC control and read/write 
access to the arrays. 

MACROCODE RESTRICTION 

VIC.ENABLE must be cleared before writing to the VIC IPRs: VMAR, VDATA, or VTAG. 
VIC_ ENABLE must be cleared before reading from VIC IPRs: VDATA, VTAG. In functional 
operation, an REI must preceed the MTPR which enables the VIC. 
See Section 7.4 for details of the IPR mechanism. 

Figure 7-2: VMAR Register 
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Table 7-1 : VMAR Register 



Name 


Bit(s) 


Type 


Description 


LW 


2 


WO 


Long-word select bit. Selects long-word of sub-block for access to cache 








array 


SUB.BLOCK 


4:3 


RW 


Sub-block select. Selects data sub-block for access to cache array, 








also latches viba<4:3> on vie parity errors 


ROW_INDEX 


10:5 


RW 


Row select. Row index for read and write access to cache array, also 








latches viba<10:5> on vie parity errors 


ADDFv 


31:11 


RO 


Error address field. Latches tag portion of vera on vie parity errors 



When the VIC is disabled, the VIC Memory Address Register (VMAR) may be used as an index 
for direct IPR access to the cache arrays. VMAR<10:5> supply the cache row index, VMAR<4:3> 
supply the cache sub-block, and VMAR<2> indicates the longword within a quadword address. 

VMAR also latches and holds the VIBA<31:3> on VIC array parity errors. 
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Figure 7-3: VTAG Register 
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Table 7-2: VTAG Register 



Name 


Bit(s) 


Type 


Description 


V 


3:0 


RW 


Data valid bits. Supply data valid bits on array read/writes 


DP 


7:4 


RW 


Data parity bits. Supply data parity on array read/writes 


TP 


8 


RW 


Tag parity bit. Supplies tag parity on tag array read/writes 


TAG 


31:11 


RW 


Tag. Supplies tag on tag array read/writes 



The VTAG IPR provides read and write access to the cache tag array. An IPR write to VTAG will 
write the contents of the M%MD_BUS to the tag, parity, and valid bits for the row indexed by 
VMAR<10:5>. VTAG<31:11> are written to the cache tag. VTAG<8> is written to the associated tag 
parity bit. VTAG<7:4> are used to write the four data parity bits associated with the indexed cache 
row. Similarly VTAG<3:0> write the four data valid bits associated with the cache row. DP<3:0> 
and V<3r0> are the data parity and data valid bits, respectively, for the 4 quadwords of data in 
the same row. DP<0> and V<0:> correspond to the quadword of data addressed when address bits 
4:3 « 00, DP<1> and V<1> correspond to the quadword of data addressed when address bits 4:3 
= 01, etc. 



Figure 7-4: VDATA Register 
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Table 7-3: VDATA Register 



Name 


Bit(s) 


Type 


Description 


DATA 


31:0 


RW 


Data for data array reads and writes 



The VDATA IPR provides read and write access to the cache data array. When VDATA. is written, 
the cache data array entry indexed by VMAR is written with the IPR data. Since the IPR data is 
a longword, two accesses to VDATA are required to read or write a quadword cache sub-block. 

Writes to VDATA with VMAR<2> = 0 simply accumulate the IPR data destined for the low longword 
of a sub-block in FTLL_D ATA< 3 1 : 0 > , A subsequent write to VDATA with VMAR<2> = 1 directs the 
the IPR data to FILL_DAIA<63:;32>, and triggers a cache write sequence to the sub-block indexed 
by VMAR. 
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Reads to VDATA. with VMAR<2> = 0 trigger a cache read sequence to the sub-block indexed by 
VMAR. The low longword of the a sub-block is returned as IPR read data. A read of VDATA. with 
VMAR<2> = 1 returns the high longword of the sub-block as IPR data. 

Figure 7-5: ICSR Register 
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Table 7-4: ICSR Register 



Name 


Bit(s) 


Type 


Description 


ENABLE 


0 


RW,0 


Enable Bit When set, allows cache access to the VIC. Initializes to 

0 On RESET. 


LOCK 


2 


WC 


Lock Bit. When set, validates and prevents further modification of 
the error status bits in the ICSR and the error address in the VMAR 
register. When clear, indicates no VIC parity error has been recorded 
and allows ICSR and VMAR to be updated. 


DPERR 


S 


RO 


Data Error Bit When set, indicates data parity error occurred in 
data array if Lock Bit also set. 


TPERR 


4 


RO 


Tag Error Bit. When set, indicates tag parity error occurred in tag 
array if Lock Bit also set 



The ICSR IPR provides control and status functions for the Ibox. VIC tag and data parity errors 
are latched in the read-only ICSR<4:3>, respectively. ICSR<2> is set when a tag or data parity 
error occurs and keeps the error status bits and the VMAR register from being modified further. 
Writing a logic one to ICSR<2> clears the LOCK bit and allows the error status to be updated. 
When ICSR<2> is clear, the values in ICSR<4:3> are meaningless. When ICSR<2> is set, a VIC 
parity error has occurred, and either ICSR<4> or ICSR<3> will be set indicating that the parity 
error was either a tag parity error or a data parity error, repectively. ICSR<4:3> cannot be cleared 
from software. ICSR<0> provides IPR control of the Vic enable. It is cleared on RESET. 

7.3 VIC Performance Monitoring Hardware 

Hardware exists in the Ibox VIC to support the NVAX Performance Monitoring Facility. See 
Chapter 16 for a global description of this facility. 

The VIC hardware generates two signals l%PMUX0 and I%PMUXi which are driven to the central 
performance monitoring hardware residing in the Ebox. These two signals are used to supply 
VIC hit rate data to the performance monitoring counters. 
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KePMUXO is asserted the cycle when a VIC read reference is first attempted while the prefetch 
queue is not full. I%PMUXi signals the hit status for this event in the same cycle. 

The data is captured only on the first read reference that could be used by the PPQ to avoid skewed 
hit ratios caused by multiple hits or misses to the same reference while the prefetch queue is full 
or the VIC is waiting for a cache fill. 

7.4 Ibox IPR Transactions 

The Ebox microcode communicates with the Ibox in part through internal processor registers 
(IPRs). The IPR reads are handled by CSU microcode. The IPR write control is distributed, however 
the description is included here for completeness. 

Ebox microcode conventions guarantee that the Ibox is idle before initiating Ibox IPR transactions. 
This is accomplished either by the knowledge that the current Ebox microcode flow takes place in 
a macroinstruction with an drain Ibox assist or by asserting an explicit E%STOP„IBOX command. 
The only exception involve the issuing of an IPR transaction when the CSU is involved in an RLOG 
unwind operation. In this case the unwind finishes in the CSU, then the CSU processes the latched 
IPR command. If the RLOG is empty when the microcode initiates an unwind, 0 will be added to 
whatever GPR is pointed to by the read pointers. 

MICROCODE RESTRICTION 

E%EBOX_LOADJPC and E%IBOX„IPR„WRrTE must not occur in the same cycle . 

7.4.1 IPR Reads 

The Ebox signifies an IPR read by asserting the E%EBOX_XPR_READ strobe, the E%EBOX_IPR„NUM, 
and the E%IBOX_IPR_INDEX. This information is latched in the SI logic stage, and an EPR request 
flag is posted. The SI next address logic responds by creating an IPR dispatch to an IPR microad- 
dress in the utility page of microcode, and by clearing the IPR request flag. All Ibox logic blocks 
associated with IPR reads examine the E%EBOX„IPR_NUM . If the IPR source is within a section, 
that section prepares to drive; the IPR read data onto the VIC_REQ_ADDR. The microcode at the 
common IPR routine reads the VIC_REQ_ADDR, passes the value through the ALU, and writes the 
data to an Ebox working register located at the E%IBOXJPR_INDEX offset in the register array. 
The VTC_REQ_ADDR is used for IPR read data source simply because it is a convenient 32-bit bus 
that runs through the entire section. 

7.4.2 IPR Writes 

The Ebox signifies an IPR write by asserting the E%EBOX_EPR_WRITE strobe and the E%EBOXJPR_ 
NUM. All Ibox logic blocks associated with IPR writes examine the E%EBOX_EPR_NUM. If the IPR 
destination is within a section, that section prepares to accept the IPR write data from the M%MD_ 
BUS. The Mbox drives the M9JMD.BUS with IPR data and asserts M%IBOX_rPR_WR to complete the 
transaction. 



DIGITAL CONFIDENTIAL 



The Ibox 7-7 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 



7.5 Branch Prediction IPR Register 



The BPCR IPR provides control for the BPU and read/write access to the history array. The 
write-only BPCR bit causes a BPU branch history table flush. The flush is identical to the con- 
text switch flush, which resets all branch table entries to a neutral value: history bits « 0100. 
The write-only BPCR<FLUSH_CTR> bit causes the BRANCH_TABLE_COUNTER<8:0> to be cleared. 
The BRANCE_TABLE_COUNTER provides an address into the branch table for IPR read and write 
accesses. Each IPR read from the BPCR or write to the BPCR with BPCR<LOAD_HISTORY> = 
1 increments the counter. This allows IPR branch table reads and writes to step through the 
branch table array. BPCR<LOAD_HISTORY> enables writes to the branch history table. A write 
to the BPCR<HISTORY> field with BPCR<LOAD_HISTORY> = 1 causes a BPU branch history 
table write. The history bits for the entry indexed by the counter is written with the IPR data. 
BPCR reads supply the history bits in BPCR<HISTORY> for the entry indexed by the counter. 
BPCR<MISPREDICT> will return a "1" if the last conditional branch mispredicted. BPCR<31:16> 
contain the branch prediction algorithm. Any IPR write to the BPCR will update the algorithm. 
An IPR read will return the value of the current algorithm. For example, a "0" in BPCR<16> 
means that the next branch encountered will not be taken if the history is "0000". A "1" in 
BPCR<21> means that the next branch encountered when the prior history is "0101" will be 
taken. 



Figure 7-6: BPCR Register 
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The microcode will write the following bit pattern as part of the powerup sequence: 

31 30 29 28127 26 25 24123 22 21 20I1S 18 17 16115 14 13 12111 10 9 8 I 7 6 5 4 1 3 2 1 0 
1111111101100101 01 All 0's | 



Table 7-5: BPCR Register 

Name Bit(s) Type Description 

HISTORY 3:0 RW Branch history table entry history bits. 

MISPREDICT 5 RO Indicates if last conditional branch mispredicted. 
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Table 7-5 (Cont.): 


BPCR Register 


Name 


Bit(s) 


Type Description 


FLUSILBHT 


6 


WO Write of a 1 resets all history table entries to a neutral value, hard- 
ware clears bit. 


FLUSH.CTR 


7 


WO Write of a 1 resets BPCR address counter to 0, hardware clears bit 


LOAD.HISTORY 


8 


WO Write history array addressed by BPCR address counter. 


BPU_ ALGORITHM 31:16 


RW Controls direction of branch for given history. 


Bits 8,7,6 are denned in Table 7-6 for IPR writes to the BPCR. NOTE: The prediction algorithm 
will be updated on every IPR write to the BPCR. 


Table 7-6: BPCR <8:6s 


> 


BIT BIT 
8 7 


BIT 

6 


Write Action 


0 0 


0 


Do nothing, except update algorithm 


0 0 


1 


Hush branch table. History not written 


0 1 


0 


Address counter reset to 0. History not written 


0 1 


1 


Flush branch table, reset address counter, history not written 


1 0 


0 


Write history to table, counter automatically increments 


1 0 


. 1 


Undefined: Branch table flushed, new history written, counter incremented 


1 1 


0 


Undefined: Write history to old counter value, counter reset to 0 


1 1 


1 


Undefined: Branch table flushed, write history to old counter value, counter 



reset to 0 



7.6 Testability 

7.6.1 Overview 

Ibox testability is enhanced by architectural features, and connection to the internal scan register 
and the parallel port. 

7.6.2 Internal Scan Register and Data Reducer 

Ibox hardware state may be latched and shifted off-chip through the global internal scan register. 
See Chapter 17 for the implementation details of the internal scan register. State included on 
the internal scan register for chip debug is TBD. 

An Ibox linear feedback shift register (LFSR) is part of the internal scan chain. The register is 
an observation only structure which can be loaded in parallel or loaded in parallel with feedback, 
acting like a data reducer. The contents may be shifted out serial through the internal scan 
register. Table 7—7 lists the signals that are contained in the Ibox LFSR. 
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Table 7-7: Ibox Scan Chain Fields 



Field Name 



# bits 



Description 



E.DL 



SPEC.CTRL 



STOP.PARSER 



2 



21 



2 



Stop parser and status flags 

spec_ctrl bits <21:13> and <11:0> 

Data length for instruction CDL of last operand) 



7.6.3 Parallel Port 

The CSU microcode address is routed to the chip parallel port. The microcode address can be 
monitered on a cycle by cycle basis during chip debug by selecting the Ibox as source to the 
parallel port. When selected, a buffered version of the control store address, MUX W H<6:0>, appears 
on PP_DAIA<6:0>. See Chapter 17 for the implementation details of the parallel port. 

7.6.4 Architectural Features 

Internal processor registers are included as architectural features to aid in testability. IPR access 
to VIC tags and data is available throught the VTAG and VDATA registers. See Section 7.2 for 
the implementation details of the these registers. IPR access to the branch history table and 
branch status is available throught the BPCR register. See Section 7.5 for the implementation 
details of the BPCR. 

7.6.5 Metal 3 Nodes 

Various Ibox nodes are brought up to minimum size CMOS-4, metal- 3 test pads for chip debug. 
State included on the internal scan register for chip debug is TBD. 

7.6.6 Issues 

Internal scan register states in the Ibox for chip debug are TBD. 
Nodes elevated to metal-3 test pads in the Ibox for chip debug are TBD. 

7.7 Performance Monitoring Hardware 



The Ibox provides two signals for performance monitoring: I%PM_VIC_ACC_H and I%PM„VTC_HIT. 
These signals enable the Ebox performance monitoring hardware to gather statistics on VIC hits 
versus VIC accesses. 



7.7.1 Signals 
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7.8 Revision History 



Table 7-8: Revision History 



Who 


When 


Description of change 


Shawn Persels 


06-Oct-1988 


Initial release. 


John F. Brown 


19-Dec-1988 


Partial Update. 


John F. Brown, 
Paul GronowskL, 
Jeanne McKinley 


06-Mar-1989 


Release for external review. 


John F. Brown, 
Ruben Castelino, 


12-Jan-1990 


Intermediate release. 


Mary Field. 
Pau] Gronowslri, 
Jeanne Meyer 






Gil Wolrich 


15-Nov-1990 


Retain Overview, 3BOX IPRs, and Testability sections for NVAX Plus 
external release. 
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Chapter 8 
The Ebox 



8.1 Chapter Overview 

The NVAX Plus EBOX chapter includes the overview description, IPR specifications, and descrip- 
tion of EBOX testabilty features from the NVAX CPU Chip Specification. 

For detailed and complete EBOX specification refer to the NVAX CPU Chip Specification. 

8.2 Introduction 

The Ebox is the instruction execution unit in the NVAX CPU chip. It is a 3 stage pipeline (S3..S5) 
which runs semi-autonomously to the rest of the NVAX Plus chip and supports the following 
functions: 

• Instruction Execution 

The Ebox is responsible for carrying out the execution portion of each VAX instruction under 
control of a microflow whose initial address is provided by the Ibox issue unit. 

• Instruction Coordination 

The Ebox is a major source of control to coordinate instruction processing in the Ibox, Mbox, 
and Fbox. It ensures that Ebox and Fbox macroinstructions retire in the proper order, and 
it provides controls to the Mbox and Ibox which help manage certain inter-macroinstruction 
dependencies. The Ebox cooperates with the Ibox in handling mispredicted branches. 

• Trap, Fault and Exception Handling 

The Ebox coordinates trap, fault, and interrupt handling. It delays the condition until all pre- 
ceding macroinstructions complete properly. It then collects information about the condition 
and ensures that the correct architectural state is reached. 

• CPU Control 

Most CPU control is provided by the Ebox. Ebox control functions include CPU initialization, 
controlling Ibox, Fbox, and Mbox activities, and setting control bits during major CPU state 
changes (e.g. taking an interrupt or executing a change mode instruction). 

The Ebox accomplishes many of the above functions by executing the NVAX Ebox microcode. This 
chapter views the Ebox as the interpreter of microcode. Describing how microcode functions are 
used to correctly emulate the VAX architecture or the architectural motivation for Ebox hardware 
functions is generally outside the scope of this discussion. 
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Figure 8—1 at the end of this section is a top level block diagram of the Ebox showing all the 
major Ebox function units, their interconnections, and their place in the pipeline. The pipeline 
segments are shown in the diagram (S2, S3, S4, and S5). The sections following the diagram 
describe the function elements depicted and the Ebox pipeline. 
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8.3 Ebox Overview 
8.3.1 Micro word Fields 

The Ebox is controlled by the data path control portion of the microword, which is either standard 
or special format. The other portion of the control word, the microsequencer control portion, 
controls the microsequencer which determines which microword is fetched in every cycle. The 
fields of the data path control portion of the microword and their effect within the Ebox are shown 
in Table For more information on microword formats and field widths see Chapter 6. 

NOTATION 

The notation FIELD/FUNCTION is used throughout this chapter to mean that microword 
field FIELD specifies FUNCTION. 



Table 8-1 : Data Path Control Microword Fields 



Microword 
Field 



Microword 
Format 



Description 



FORMAT 
LIT 

ALU 
MRQ 

SHF 
VAL 
A 

B 

POS 



Both 
Both 

Both 
Both 

Standard 

Standard 1 

Both 

Both 1 
Both 2 



This one-bit field determines whether the microword is in the special format. 
If it is 1, the MISCl, MISC2, and D fields exist. If it is 0, the Q, SHF, and 
VAL fields exist instead. 

This one-bit field determines whether the microword is the constant generation 
variant (format). If it is 1, the POS and CONST fields exist. If it is 0, the VAL 
and B fields exist instead in standard format, and the MISC2, D, and B fields 
exist instead in special format. 

Sets the ALU function, including typical ALU operations, and others. 

Controls initiation of Ebox memory acceBses,VECTOR MEMORY ACCESSES, 
and other Mbox control functions. The Ebox decodes the field and sends the 
corresponding request to the Mbox. 

Sets the shifter function The W and Q fields control how the shifter output 
is used. Some settings of this field specify a pass operation instead of a shift. 

Specifies the shift amount (1 to 31) or, if VAL = 0, specifies to shift the amount 
in the SC register. 

Specifies the Bource of E_bub«abus<31:0> for this microword. The A field 
can Belect any element in the register file or one of several of Ebox sources. 
e_bub%abus<31:0> is one of the two sources for the ALU and the shifter. 

When the source of E_BUS%BBUS<31:0> is a register this field specifies the 
source of e_bus«bbus<31:0>. The B field can select from some of the ele- 
ments in the register file or from a small number of other Ebox sources. E_ 
bub%bbus<31:0> is one of the two sources for the ALU and the shifter. 

When the source of E_BUS l fcBBUS<31:0> is from the constant generator this 
field specifies which byte the constant value is in. Bytes 0 through 3 may be 
specified. The other bytes are forced to 0. 



*Not constant generation microword variant. 
2 Constant generation microword variant. 
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Table 8-1 (Cont.): Data Path Control Microword Fields 



Microword 
Field 



Microword 
Format 



Descriptdon 



CONST 

CONST.10 8 
DST 

Q 
W 
L 

V 

MISC 

MISC1 
MISC2 



Both 2 

Both 2 

Both 

Standard 

Both 

Both 

Both 
Both 

Speeded 
Special 1 



DISABLE.RETTRE Special 1 



This field contains the literal byte value which is sourced to one of the bytes of 
e_bus%bbus<31:0> as specified by the POS field. ("The other e_bus%bbus<31:0> 
bytes are forced to 0.) 

This field contains the literal 10-bit value which is sourced to e_bus%bbus<9:0> . 
(E.BD8%bbus<31:10> are forced to 0.) 

This field specifies the destination of e%wbus<31:0>. The possible destinations 
include a subset of the register file and a number of other Ebox destinations. 

Controls whether or not the Q register is loaded with the shifter output for 
this microword. 

Selects the driver of e%wbus<31:0>. Either the ALU or the shifter output is 
driven on e%wbus<31:0>. 

This field controls whether the Ebox operations are done with a data length of 
longword or the length specified in the DL register. The Ebox operations af- 
fected are condition code calculation, size of memory operations, zero extending 
of e%wbus data, and bytes affected by register file writes. 

Controls updating of the VA register. Either the VA register is updated with 
the value from the ALU, or it is not changed from its previous value. 

This field has many uses. Only one use can be selected at a time. This field 
can control PSL condition code alterations, set the DL register, set or clear state 
flags, or invoke a box coordination or control function. 

This field can specify one of a few Ibox or Fbox coordination or control func- 
tions, and can be used to set or clear state flags. 

One Mbox control function and one to add an Fbox destination scoreboard 
entry. 

This field is used to disable retire of macroinstructions and retire queue entries 



1 Not constant generation microword variant. 
2 Constant generation microword variant. 

3 The CONST.10 field is actually the POS field bitwise concatenated with the CONST field, with the POS field in the 
more significant position. It is simply a way of treating these two microword fields as one. CONST.10 is only UBed when 
MISC/CONST. 10.BIT is specified. 



When a microword field is not present in all formats, it defaults to NOP (no operation) when a 
microword format without that field occurs. More specifically, standard format microwords effec- 
tively specify MISCi/NOP, MISC2/NOP, and DISABLE.RETTRE/NO by default. Special format microwords 
effectively specify Q/HOLD.Q, SHF/NOP, and VAL/0. When the microword is the constant generation 
variant of the standard format, microword, VAL/0 is effectively specified, and the B field is ignored 
since this microword variant sources a constant onto E_BUS%BBUS<31:0>. In the constant gener- 
ation variant of the special format microword, MISC2/NOP and DISABLE.RETTRE/NO are effectively 
specified, and the B field is ignored because this microword variant also sources a constant onto 
E_BUS%BBUS<3 1:0>. 
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8.3.1 .1 Microsequencer Control Fields 

In addition to decoding the datapath control portion of the micro-word, the Ebox decodes a part 
of the Microsequencer control portion of the microword. Specifically, it detects when the SEQ.FMT 
and SEQ.MUX fields (see Chapter 9 and Chapter 6) specify LAST. CYCLE or LAST. CYCLE. OVERFLOW. 
The Ebox fault detection logic and the RMUX control logic use these decodes. 

8.3.2 The Register File 

The register file contains four kinds of registers: MD (memory data), GPR, Wn (working), and 
CPUSTATE registers. The MD registers receive data from memory reads initiated by the Ibox, 
and from direct writes from the Ibox. The Wn registers hold microcode temporary data. They 
can receive data from memory reads initiated by the Ebox and receive result data from ALU, 
shifter, or Fbox operations, and from the Ibox in the case of Ibox IPR reads. The GPRs are the VAX 
architecture general-purpose registers (though R15 is not in the file) and can receive data from 
Ebox initiated memory reads, from the ALU or shifter, or from the Ibox. The CPUSTATE registers 
hold semipermanent architectural state (e.g. KSP, SCBB). They can only be written by the Ebox. 

8.3.3 ALU and Shifter 

Each microword specifies source operands for the ALU or shifter (A, B, POS, and CONST fields), 
operations for these function units to perform (ALU, SHF, and VAL fields), and a destination (or 
possibly two destinations if Q or VA is updated) for the result(s) (DST, Q, W, and V fields). Note 
that in special format microwords no shifter operation can be specified and the Q register can't be 
altered. In the course of executing the microword, the Ebox will fetch the source operands onto 
E_BUS%ABUS<31:0> and EJBUS%BBUS<31:0>, carry out the specified ALU and shifter functions, 
and store the result in the specified locations (if any). 

8.3.3.1 Sources of ALU and Shifter Operands 

In general the sources of E_BUS%ABUS<31:0> and E_BUS%BBUS<31:0> (the inputs to the ALU and 
shifter) are either a constant, a register from the register file, an Ebox register (e.g. PSL, Q, or 
VA), an Ebox source value calculated by a special function unit, a hardware status provided via 
a special path from outside the Ebox (e.g., interrupt status), or an entry from the source queue. 
E_BUS%BBUS<31:0> sources are limited to a subset of the register file, certain Ebox registers, and 
an entry from the source queue. The source queue is introduced in Section 8.3.4. 

8.3.3.2 ALU Functions 

The ALU is capable of standard operations on byte, word, and longword size operands. It can pass 
either input to the output and is capable of a number of arithmetic and logical operations on one 
or two operands, producing condition codes based on data length and operation. 

8.3.3.3 Shifter Functions 

The shifter does longword and quad word shift operations and certain pass-thru operations, always 
producing a longword output. The shifter treats the two sources as a single quadword, with 
E_BUS9bABUS<31:0> as the more significant longword. The longword output is this quadword 
shifted right 0 to 32 bits and truncated to longword length. The shifter produces condition codes 
based the longword output data. . 
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8.3.3.4 Destinations of ALU and Shifter Results 

The output of the shifter and the output of the ALU can drive E%WBUS<31:0>. The shifter output 
is also directly connected to the Q register so that the Q register can be loaded with the shifter 
output regardless of the source of E%WBUS<31:0>. In the same way, the ALU output is directly' 
connected to the VA register. E%WBUS<31:0> data is the input to one of the write ports on the 
register file and can be used to update any register file entry except an MD register. Certain other 
Ebox registers (e.g. SC, PSL) can be loaded from E%WBUS<31:0>. 

The destination of E%WBUS<31:0> can be specified by the current destination queue entry, when 
the microword so specifies. The destination queue is introduced in the following section. 

8.3.4 Ibox-Ebox Interface 

The Ibox-Ebox interface is made up of a number of FIFO queues. The purpose of these queues is to 
allow the Ibox to fetch and decode new instructions before the Ebox is ready to execute them. The 
Ibox adds entries as it decodes instructions, and the Ebox removes them from the other end as it 
executes them. For each opcode, there is a predetermined number of entries added to the various 
queues by the Ibox. Ebox execution microflows remove exactly the right number of entries from 
each queue. 

The queues which interface the Ibox to the Ebox directly are the source queue, the destination 
queue, the branch queue, and. the field queue. The instruction queue, the PA queue, and the 
retire queue are introduced here for completeness. 

The source queue holds source operand information. Entries are added by the Ibox as it decodes 
the source type operand specifiers of each instruction. The entry is either a pointer into the 
. register file or the data from a literal mode operand specifier. The Ebox accesses and removes 
an entry each time a microword specifies a source queue access in either the A or B fields. If the 
entry is literal data, it is used as an ALU and/or a shifter operand. Otherwise the register file is 
accessed using the pointer in the entry. 

The destination queue holds result destination information. Entries are added by the Ibox as it 
decodes the destination type operand specifiers of each instruction. A destination queue entry 
is either a pointer to a GPR in the register file or a flag indicating that the result destination is 
memory. The Ebox accesses and removes an entry each time a microword specifies a destination 
queue access in the DST field or the Fbox supplies a result which specifies a destination queue 
access. If the entry is a pointer to a GPR, the Ebox writes the ALU, shifter, or Fbox data into the 
register file. Otherwise the data is stored in mem'ory at the address found in the PA queue. 

The PA queue is in the Mbox. Each time the Ibox adds an entry indicating a memory destination 
to the destination queue it also sends the Mbox a virtual address to be translated. When the 
Mbox has translated the address it puts it in the PA queue. If the current destination queue 
entry indicates a memory destination, the Ebox sends the result data to the Mbox to be written 
to the physical address found in the PA queue. The Mbox removes the PA queue entry as it uses 
it. 

The branch queue holds status bits for each branch instruction processed by the Ibox. The Ibox 
adds an entry to the branch queue each time it finishes processing a conditional or unconditional 
branch. The Ebox references; and removes the current branch queue entry in the execution 
microfiow for the branch. This allows the Ebox to synchronize with the Ibox so that the branch 
does not finish executing until the Ibox has successfully fetched the branch displacement specifier. 
It also allows the Ebox to check for an incorrect branch prediction by the Ibox. 
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Each time the Ibox decodes a branch it calculates the branch address. For unconditional branches 
it simply begins fetching from the new instruction stream immediately. For conditional branches 
the Ibox predicts whether the branch will be taken or not. The branch queue entry added by 
the Ibox indicates the branch prediction. When the Ebox executes an unconditional branch, it 
references the branch queue simply to ensure that the Ibox was able to fetch the displacement 
specifier without a fault or error. For conditional branches the Ebox also checks that the branch 
prediction was correct and initiates a microtrap if it wasn't. If the branch wasn't correct, the 
Ebox notifies the Ibox, which uses the alternate path PC (which it had kept) to begin fetching 
along the correct path. 

The retire queue holds status for each macroinstruction currently being executed in the Ebox 
or the Fbox. The status indicates which unit will execute the instruction, the Ebox or the Fbox. 
The Ebox adds an entry each time the Microsequencer dispatches to a macroinstruction execution 
microflow. The Ebox references the retire queue when the macroinstruction execution is complete 
in order to ensure that instructions finish executing in the proper order. A certain amount of 
concurrent execution in the Fbox and Ebox is possible. The retire queue is used to prevent one 
box from altering any architecturally visible state before the other box's execution for preceding 
macroinstructions finishes. The Ebox references and removes a retire queue entry each time an 
Fbox or Ebox instruction is retired. 

The field queue holds a one-bit type status for variable-length bit field base address operands 
processed in the Ibox. (Note that some operands are treated as variable-length bit field base 
address operands internally by the NVAX CPU even though the operand is not really the base 
address of a variable-length bit field. These operands, including the true bit field base address 
operands, are collectively referred to as field operands.) The field queue entry indicates whether 
the field operand was register mode. The Ibox adds an entry when it processes operands which 
it knows by context require an entry. The Ebox retires an entry after it has used the information 
in a microcode conditional branch. Very different execution microflows are required for some 
instructions, particularly bit field instructions, depending on whether a particular operand is 
register mode or specifies a memory address. In the latter case the information sent by the Ibox 
is a memory address, while in the first case the source and destination queue entries point to the 
register in the register file. 

The instruction queue is part of the Ibox-Microsequencer interface. It holds information derived 
from the VAX instruction opcode. The Ibox adds an entry as it decodes each instruction. An 
entry contains the opcode, data length, the microcode dispatch address for execution, and a flag 
indicating whether the macroinstruction is for the Fbox. The Microsequencer references and 
removes an entry at the start of execution of each VAX instruction. It uses the dispatch address to 
fetch the first microword of the macroinstruction execution microflow. At the same time it passes 
the opcode, data length, and the Fbox execution flag to the Ebox. The Ebox adds an entry to 
the retire queue at that time. That entry is simply the Fbox execution flag (except if the Fbox is 
disabled. 

8.3.5 Other Registers and States 

The Ebox contains several special purpose registers, the SC, VA, and Q registers, and the PSL. 
The SC register holds a shift count for use in some shift operations. 

The VA register can hold a virtual address or a microcode temporary value. The VA register is 
directly readable by the Mbox and is the address source for all Ebox initiated memory operations. 
The VA register is loaded directly from the ALU output. 
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The PSL is the VAX architecture program status long-word register. It is loaded from E%WBUS<31:0> 
and can be used as a source operand by the ALU or shifter. Its bits are used in many places in 
the Ebox and elsewhere in the CPU where required by the VAX architecture. 

The Q register is loaded from the output of the shifter. It holds shifter results for later use. 

8.3.6 Ebox Memory Access 

Through the mechanism of the source queue and the destination queue, the Ibox initiates most 
memory accesses for the Ebox. In certain cases the Ebox must carry out memory accesses on 
its own. The MRQ field of the microword specifies the Mbox operation. The virtual or physical 
address is provided from the Va register. If the VA is being updated in this microword, the address 
is bypassed directly from the output of the ALU. For writes, the data is taken from E%WBUS<31:0>, 
so it can be the output of the shifter or the ALU. For reads, the DST field of the microword specifies 
the register file entry which is to receive the data. This register must be a GPR or a working 
register. 

8.3.7 CPU Control Functions 

Most control functions are invoked through one of the MISC fields, but some of the MRQ field 
functions are Mbox control functions or miscellaneous control functions rather than memory 
access commands. The control functions generally act to reset a function unit (Fbox, Ibox, or 
Mbox), synchronize Ebox operation with a function unit, or restart semiautonomous operation of 
the Mbox or Ibox when either of them has stopped for some reason. 

8.3.8 Ebox Pipeline 

Execution of microwords in the Ebox is pipelined with three, pipe stages (S3..S5). These stages 
are shown in Figure 8-1. In the first stage (S3), the E_BUS%ABUS<31:0> and E_BUS%BBUS<31:0> 
sources are fetched or prepared. In the second (S4) the ALU and shifter operate on the data. In 
the third (S5) the result is written into the register file or to some other destination. Stages 
S3 and S4 can stall for various reasons. Stage S5 cannot stall. Once a particular microword's 
execution has advanced into S5, it is going to complete. Various stalls occur in S4 in order to 
ensure that a particular microword's effects do not change any architectually visible state (e.g., 
GPRs, PSL) before proper completion without memory management faults is guaranteed. 

The Microsequencer fetches the microword and delivers it to the Ebox in S3. If the Ebox's S3 
stage is stalled, the Microsequencer's S2 activity is stalled as well. See Chapter 9 for more detail. 

Even though the operand fetch, function execution, and result store take place in different cycles, 
the microword specifies the operation as if it all took place in one cycle. The Ebox has bypass 
paths which allow a microword to use a register as a source even it it is updated by one of the two 
preceding microwords. For example, if the immediately preceding microword updates Wi in the 
register file and the current microword specifies Wi as a source to the ALU, the Ebox hardware 
detects the condition and muxes the data into the staging latch before the ALU at the same time 
as it forwards the data to the latch which sources E%WBUS<31:0> in stage S5. 
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Bypass paths are only implemented where performance considerations warrant. Also bypass- 
ing isn't the solution to every problem pipelining introduces. For example, after the PSL is 
updated the microcode allows 2 cycles before a microword spedfying SEQ.MUX/LAST.CYCLE or 
SEQ.MUX/IAST.CYCLE.OVERFLOW because the PSL is not actually updated until S5. The 
Microsequencer uses the FPD, T, and TP bits in the PSL to determine the proper new microflow 
dispatch. It would make the decision based on old PSL information if the microcode didn't allow 
the 2 cycles. 

One place where the effect of pipelining is particularly apparent is in microcode conditional 
branches. For example, a microcode branch based on E_BUS%BBUS<31:0> data must immediately 
follow the microword which sources the relevant data onto E_BUS%BBUS<31:0>. Similarly, a 
microcode branch based on the ALU condition codes must be the second microword after the one 
which specified the ALU operation. See Chapter 9 for more detail on microcode branches. 

8.3.9 Pipeline Stalls 

The Ebox pipeline is controlled by the stall and fault logic. This function unit supplies stall 
signals which are used to gate clocking of control and data latches in each stage. It also controls 
insertion of effective no-ops into S4 when S3 is stalled and into S5 when S4 is stalled. 

The Ebox pipeline stalls in S3 when it is accessing a source operand in the register file or the 
source queue which is not valid. Man3 r register file entries have a valid bit associated with them. 
A register file entry is not valid, and its valid bit is not set, if a memory read has been initiated 
for that entry and hasn't yet completed. A source queue entry is not valid if the Ibox hasn't added 
that entry yet. 

The Ebox stalls in S4 if the current destination queue entry is not valid and the microword in 
S4 references a destination queue entry. A destination queue entry is not valid if the Ibox hasn't 
added that entry yet. 

The Ebox stalls in S4 if the current destination queue entry is valid but specifies a memory 
destination for the data and the current PA queue entry is not valid. A PA queue entry is not 
valid if the Mbox hasn't added that entry yet. 

The Ebox stalls in S4 if the microword in S4 requests a memory operation and the Mbox is 
already working on an Ebox initiated memory operation (that is, the previous request is still in 
the EM_LATCH). 

The Ebox stalls in S4 if the microword in S4 synchronizes with the branch queue and the branch 
queue entry is not valid. A branch queue entry is not valid if the Ibox hasn't added that entry 
yet. 

The Ebox stalls in S4 if the current retire queue entry specifies that an Fbox instruction must 
retire before the instruction associated with the microword in S4 and the Ebox is requesting the 
use of the RMUX to store result data. (The Ebox requests the use of the RMUX if the microword in 
S4 specifies anything other than NONE in the DST field.) 

If the Ebox stalls in S3, the S4 and S5 stages of the pipeline can continue execution. If S4 doesn't 
stall when S3 does, then an effective no-op is inserted into S4 after the current S4 operation 
advances into S5. The no-op is necessary so that the stalled S3 microword isn't advanced to S4 
and S5 while an S3 stall is in effect. 
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If the Ebox stalls in S4 then S3 stalls as well. (Microwords can't pass each other in the pipeline.) 
During S4 stalls, an effective no-op is inserted into S5 after the operation in S5 completes. This 
is necessary so that the operation in S4 isn't advanced into S5 while an S4 stall is in effect. 

In any cycle that the Ibox has not made a microstore dispatch address available to the 
Microsequencer and a dispatch is needed (i.e., during the last cycle of any microflow), the mi- 
crosequencer fetches the STALL micro word. This microword specifies no Ebox operation and can't 
cause a stall anywhere in the pipeline (although it does specify SEQ.MUX/LAST.CYCLE). This allows 
the microwords already in the pipeline to continue even when the Ibox is temporarily unable to 
supply new instruction execution dispatches. See Chapter 9 for more detail. 

A microcode loop which repeatedly accesses the field queue until the current field queue entry 
becomes valid is also very much like a stall, though the stall logic is not actually involved. This 
condition is referred to as a field queue stall. In this situation, the Ebox pipeline advances in 
each cycle (unless the microword in S4 is stalled also). However, the same microword is fetched 
out of the control store in every cycle. In typical microcode usage of the field queue conditional 
branch, this microword will not alter any state in S4 or S5. 

8.3.10 Microtraps, Exceptions, and Interrupts 

The Ebox and Microsequencer together coordinate the handling of exceptions and interrupts. 
Most interrupts and some exceptions are handled by Microsequencer dispatching to a microcode 
exception handler routine at the end of the current VAX instruction. These dispatches do not affect 
the execution of microwords already in the pipeline. Other exceptions cause a microtrap. In a 
microtrap the Microsequencer signals the Ebox to cause stages S3, S4, and S5 of the Ebox control 
pipeline to be flushed. It also signals the Ebox to flush the retire queue. (Flushing of the other 
Ibox-to-Ebox queues, the Fbox pipeline, and the specifier queue in the Mbox is done by microcode, 
except in the case of a branch misprediction.) At the same time the Microsequencer fetches a new 
microword from a special dispatch address in the control store based on the particular microtrap 
condition. This microflow handles any other necessary state flushing. Because a microtrap affects 
microwords already in the pipeline, the Ebox delays handling most traps until the microword 
which incurred the fault has reached S4. The microtrap is taken at the time that microword 
would normalfy have entered S5. In certain cases, Ebox stalls delay a microtrap until the stall 
is ended. The purpose of this; is to ensure that operations which are part of a preceding VAX 
instruction are allowed to complete properly. 

Most of the microtraps which the Ebox delays until S4 are due to Ibox-initiated memory operations 
which had an access or translation fault. Faults due to Ibox-initiated reads are detected by the 
Ebox when it accesses a valid MD register from the register file, and the fault bit associated with 
that MD is set. Each MD register has a fault bit which is set by the Ibox or the Mbox when a fault 
occurs in the memory reads necessary to fetch the source data. When the Ebox accesses an MD 
register with its fault bit set in S3, it carries that fault status down the pipeline into S4. 

All faults detected in S3 are piped to S4 before they cause a microtrap. Faults detected in S4 or 
piped to S4 will cause a microtrap only if the Ebox is next to retire a macroinstruction. Otherwise 
they are delayed until the Fbox retires an instruction and the retire queue entry indicates the 
Ebox. 

Fault status signals are sent by the Ibox for entries in the instruction queue, source queue, field 
queue, destination queue, and branch queue. Entries in the PA queue have fault bits. The Ebox 
detects a fault when it accesses a PA queue entry with its fault bit set or when it finds the 
instruction queue, source queue, field queue, destination queue, or branch queue empty and one 
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of the fault status signals from the Ibox asserted. In the case of the instruction queue, the fault is 
detected in S2 and carried into S3 only when there is no S3 stall. In the case of the source queue 
and field queue, the faults are detected in S3. Instruction queue, source queue, and field queue 
related faults are carried down the pipeline until they reach S4, where they cause a microtrap 
once the Ebox is next to retire a macroinstruction. 

Faults encountered in Ebox-initiated memory operations cause the Microsequencer to trap im- 
mediately. Ebox memory accesses begin in S5 so these traps cannot affect microwords from 
preceding VAX instructions. It is up to microcode to make sure that the last Ebox memory access 
has completed properly before the Microsequencer dispatches to another VAX instruction execution 
micronow. 

Hardware errors are essentially handled in the same way as faults. 
8.3.11 Ebox IPRs 

The CPUSTATE registers contained in the Register File are used by the microcode to hold el- 
ements of architectural state. They are read and written only by the EBOX. There are 10 
CPUSTATE registers: KSP, ESP, SSP, USP, ISP, ASTLVL, SCBB, PCBB, SAVEPC, and SAVEPSL. 
Also the Ebox implements two IPRs. They are IPRs 124-125 (decimal), PCSCR and ECR. 

ECR is a possible source of E_BUS%ABUS<31:0>, accessed by specifying ECR in the A field of the 
microword. ECR and PCSCR are also possible destinations of E%WBUS<31:0>, written by specifying 
PCSCR or ECR in the DST field of the microword. On writes, the entire register is written, regardless 
of the current DL value. 

8.3.11.1 IPR 124, Patchable Control Store Control Register 

The PCSCR is used to load control store patches. Chapter 9 describes the patchable control store 
function in detail. Figure 8-2 and Table 8-2 show the bit fields and give descriptions. 

Figure 8-2: PCS Control Register, PCSCR 
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Table 8-2: PCSCR Field Descriptions 



Name 



Bit(s) Type Description 



PAR_PORT_DIS 



PCS_ENB 



PCS_ WRITE 



RWLSHEFT 



DATA 



NONSTANDARD_PATCH 23 



PATCH REV 



8 WO,0 Writing a 1 disables control by the testability parallel port of 

the section of the internal scan used in loading the control 
store CAM (content addressable memory) and RAM. This is 
necessary when using this register to load the control store 
CAM and RAM. 

9 WO,0 Enables the control store CAM and RAM so that patches are 

fetched and supersede the control store ROM. 

10 WO The event of writing a 1 to this bit causes the PCS scan chain 

contents to be written into the control store CAM and RAM. 
The control signal which enables the write returns to the in- 
active state automatically; there is no need for software to 
write a 0 to this bit after writing a 1. 

11 WO The event of writing a 1 to this bit causes the PCS scan chain 

to shift by one. The control signal which enables the shift 
returns to the inactive state automatically; there is no need 
for software to write a 0 to this bit after writing a 1. 

12 WO This bit holds the data which is shifted into the PCS scan 

chain when a 1 is written to RWL_SHTFT. By repeatedly set- 
ting DATA and writing a 1 to RWL_SHTFT, software can shift 
any data pattern into the PCS scan chain. 

RW This bit is set by software after loading a microcode patch. If 
it is 1, it indicates a non-standard microcode patch has been 
loaded. This bit is returned as bit<8> in a read from the SID 
processor register, except that 0 is substituted for this bit in 
microcode for a SID read if PCSCR<PCS_ENB> is 0. 

28:24 RW This bit is set by software after loading a microcode patch.lt 
indicates the revision of the standard microcode patch which 
has been loaded. This field is returned as bits <13:9> in a read 
from the SID processor register, except that 0 is substituted 
for this bit in microcode for a SID read if PCSCR<PCS_ENB> 
is 0. 



8.3.11.2 IPR 125, Ebox Control Register 

The ECR is used to configure certain Ebox functions. Figure 8—3 and Table 8—3 show the bit fields 
and give descriptions. 
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Figure 8-3: Ebox Control Register, ECR 
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Table 8-3: ECR Field Descriptions 



Name 



Bit(s) Type Description 



VECTOR.PRESENT 

FBOX.ENABLE 
TIMEOUTEXT 



FBOX_ST4_BYPASS_ 3 
ENABLE 

TIMEOUT OCCURRED 4 



TIMEOUT„TEST 



TIMEOUT.CLOCK 



ICCS_EXT 

FBOX_TEST_ENABLE 13 RW,0 

PMFJENABLE 

PMF„MUX 

PMF.EMUX 
PMF_LFSR 



0 . RW,0 This bit is for vector unit support in a future version of this 

chip. 

1 RW,0 This bit is set by configuration code to enable the Fbox. 

2 RW,0 This bit is set by configuration code to select an external time- 

base for the S3 stall timeout timer. Since the NVAX Plus 
input clock requirements are for the test clock inputs to be 
dasBerted in system operation, selecting an external time base 
results in the disabling of S3 timeouts. 

3 RW.O This bit is set by configuration code to enable Fbox Stage 4 
bypass. 

WC This bit indicates that an S3 stall timeout occurred. Writing 
it with 1 clears it. 

RW,0 If this bit is a 1, the S3 timeout circuit counts cycles instead 
of cycles in which e%ttmeout_enable_h is asserted. In this test 
mode the S3 stall timeout time is roughly 50 microseconds 
instead of roughly 3 Beconds. 

RO This bit is most significant bit of the timeout base counter. It 
is used as an indication that e%tmeout_knabix..h is functioning 
(though some logic is not covered by this test). It should be 1 
half of the time and 0 the other half of the time. The period 
of oscillation is 65536 times the cycle time of the chip or of 
the waveform on p**osc_tci_h. depending on ECR<TIMEOUT_ 
EXT>. For ECR<TIMEOUT_EXT> set to 0 and a 14 nsec cycle 
time, this is a period of roughly 900 microseconds. 

RW This bit is not used for NVAX Plus. NVAX Plus supports 
the full interval timer support with ICCS, NICR, and ICR 
processor registers implemented in the NVAX Plus CBOX. 

When this bit is set to a 1, k<*fbox_test_enb_h is asserted. This 
puts the Fbox in a test mode in which data is passed from 
stage to stage unaltered. 

16 RW,0 This bit is the internal implementation of the PME processor 
register. 

18:17 RW,0 This field selects the source of events counted by the perfor- 
mance monitoring facility, when enabled, to be Ibox, Ebox, 
Mbox, or Cbox. 

21:19 . RW,0 This field Belects the EBOX events counted by the perfor- 
mance monitoring facility, when the performance monitoring 
facility is configured to count Ebox events. 

22 RW,0 This bit enables e%wbusjb<31:0> LFSR (linear feedback shift 
register) accumulator. This is a testability feature. 
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Table 8-3 (Cont.): ECR Field Descriptions 



Name 


Bit(s) 


Type 


Description 


PMFCLEAR 


31 


WO 


Writing a 1 to this bit clears the performance monitoring fa- 
cility counters (which are also theE«WBUs_H<31:0> LFSR ac- 
cumulator). It is not implemented in hardware. Microcode 
handles this function. 



NOTE 

THE SUBSET INTERVAL TIMER FUNCTIONALITY IS REMOVED FROM NVAX 
Plus. 



8.3.12 Initialization 

The main mechanism for Ebox initialization is the power-up microtrap, and the MISC/RESET. CPU 
which occurs in the first microword of this microtrap flow. When this trap occurs, the Microsequencer 
will assert E„USQ%PE_ABORT, aborting the Ebox pipeline as it does for any microtrap. None of 
the registers in the register file or elsewhere in the Ebox are cleared on initialization, except that 
IPR bits are cleared where indicated by the bit type (see Section 8.3.11). The state flags are also 
cleared by reset. 

The Ebox asserts E%STOP_IBOX, E%FLUSH_EBOX, E%FLUSH„MBOX, and E%FLUSH_FBOX during 
reset. This is the same effect as MISC/RESET.CPU. See the sections on initialization for each of the 
boxes for more detail. 

8.3.13 Testability 

This section describes the testability features in the Ebox. 

8.3.13.1 Parallel Port Test Features 

The following signals can be observed on the parallel test port. 

• E%S3_STALL 

• E%S4_STALL 

• E%RMUX_84_STALL 

• Ebox retire queue output 

• E_USQ%PE_ABORT 

The following control functions are available on the parallel test port. 

• Force source queue stall 

Forces a source queue stall in any microword which accesses the source queue regardless of 
the actual number of entries in the queue. 

• Force destination queue stall 

Forces a destination queue stall in any microword which accesses the destination queue 
regardless of the actual number of entries in the queue. 
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• Force branch queue stall 

Forces a branch queue stall in any microword which accesses the branch queue regardless of 
the actual number of entries in the queue. 

8.3.1 3.2 Observe Scan 

A number of signals in the Ebox are readable using the internal scan chain. Most of these are 
control signals. 

This is a list of the signals on the scan chain. They all are connected for observe only. 

• E%WBUS<31:0> LFSR. 

• The EM bus outputs. 

• The significant stall result signals and enough of the precursors to allow determination of 
which stall is in effect. 

• The significant fault results and E_USQ%PE_ ABORT. 

• The bus E_USQ%UTEST. 

8.3.13.3 E%WBUS<31:0> LFSR 

E%WBUS<3 1:0> has an LFSR (linear feedback shift register) accumulator. Its output can be scanned 
out via the observe scan chain. It can be reset to zero by TBS control. 

ISSUE 

The control to clear E%WBUS<31:0> LFSR will be specified when the testability strategy 
is settled. 

8.3.14 Revision History 



Table 8—4: Revision History 


Who 


When 


Description of change 


John Edmondson 


30-MOV-1988 


Initial Release. 


John Edmondson 


19-DEC-1988 


Corrections and Updates. 


John Edmondson 


06-MAR-1989 


Release for external review. 


John Edmondson 


29-NOV-1989 


Updates after external review and modeling complete. 


John Edmondson 


18-DEC-1989 


Further updates, particularly adding real signal names. 


John Edmondson 


31-JAN-1990 


Updates reflecting minor implementation motivated changes 
- rev 0.5. 


John Edmondson 


4-MAY-1990 


Updates reflecting minor implementation motivated changes 
- post rev 0.5. 


Gil Wolrich 


15-Nov-1990 


EBOX chapter for NVAX Plus external release 
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Chapter 9 

The Microsequencer 



9.1 Overview 

This chapter includes the microsequencer block diagram and descriptions of major hardware com- 
ponents including the Control Store, Patchable Control Store, and Microtest Bus, and the mi- 
crosequencer testability features. The Microsequencer chapter of the NVAX CPU Chip Functional 
Specification should be referred to for complete description of the Microsequencer. 

The microsequencer is a microprogrammed finite state machine that controls the three Ebox 
sections of the NVAX Plus pipeline: S3, S4, and S5. The microsequencer itself resides in the S2 
section of the pipeline. It accesses microcode contained in an on-chip control ROM, and microcode 
patches contained in an on-chip SRAM. Each microword is made up of fields that control all three 
pipeline stages. A complete microword is issued to S3 each cycle, and the appropriate microword 
decodes are pipelined forward to S4 and S5 under Ebox control. 

Each microword contains a microsequencer control field that specifies the next microinstruction 
in the microfiow. This field may specify an explicit address contained in the microword or direct 
the microsequencer to accept an address from another source. It also allows the microcode to 
conditionally branch on various NVAX states. 

Frequently used microcode can be made into microsubroutines. When a microsubroutine is called, 
the return address is pushed onto the microstack. Up to six levels of subroutine nesting are 
possible. 

Stalls, which are transparent to the microcoder, occur when an NVAX resource is unavailable, 
such as when the ALU requires an operand that has not yet been provided by the Mbox. The 
microsequencer stalls when S3 of the Ebox is stalled. 

Microtraps allow the microcoder to deal with abnormal events that require immediate service. 
For example, a microtrap is requested on a branch mispredict, when the Ebox branch calculation 
is different from that predicted by the Ibox for a conditional branch instruction. When a microtrap 
occurs, the microcode control is transferred to a service microroutine. 

9.2 Functional Description 
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9.2.1 Introduction 

The NVAX microsequencer consists of several functional units of logic that are explained in the 
following sections and illustrated in the block diagram, Figure 9-1. 

9.2.2 Control Store 

The control store is an on-chip ROM which contains the microcode used to execute macroinstruc- 
tions and microtraps. It is made up of up to 1600 micro words. These are arranged as 200 entries, 
each entry consisting of 8 microwords. Each microword is 61 bits long, with bits <14:0> being 
used to control the microsequencer. The remainder of the microword, bits <60:15>, is used by the 
Ebox to control S3 through S5. The Ebox also receives bits <14,12:11>, enabling it to recognize 
the last cycle of a microfiow and the validity of the microtest bus select lines. 

The control store access is performed during #34 of 32 and $1 of S3 of the NVAX pipeline. Hie 
output of the Current Address Latch. E_USQ_CAL%CALJB<10:0>, is used to address the control 
store. Bits <10:4,0> are used to select one of the 200 entries. The eight microwords in the selected 
entry then enter an eight- way multiplexer, where E_USQ_CAL%CAL_H<3:1> select the final control 
store output. This structure is used because E_USQ_CAL%CAL_H<3 : 1> are valid later than bits 
<10:4,0>, since E_USQ_CAL%CAL_B<3:1> must be OR'd with the microtest bus for a BRANCH 
format microinstruction. 

9.2.2.1 Patchable Control Store 

The patchable control store is an on-chip SRAM which contains microcode patches. It consists of 
up to 20 microwords. It operates in parallel with the control store. The microaddress from the 
CAL is the input to its CAM (Content Addressable Memory). If the address hits in the CAM, the 
output of the patchable control store is selected as the new microword, rather than the output of 
the regular control store. 

The patchable control store and CAM are precharged in #3 and evaluate in #41. The CAL output, 
E_USQ_CAL%CAL_H<10:0>, is used in its entirety as the lookup address in the CAM, as opposed 
to the l-of-200 selection followed by the l-of-8 selection used in the ROM control store. 

Entries in the Patchable Control Store and its CAM are written under software control from 
registers in the Ebox. The CAM is disabled during this operation. 

9.2.2.2 Microsequencer Control Field of Microcode 

The microsequencer control field of the NVAX microword is used to help select the next microword 
address. The next address source is explicitly coded in the current microword; there is no concept 
of sequential next address. 

The SEQ.FMT field, bit <14> of the microsequencer control field, selects between the following 
two formats: 
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Tabie 9-1 : Jump Format Control Field Definitions 



Name 


Bit(s) 


Description 


SEQ.FMT 


14 


0 for JUMP 


SEQ.CALL 


13 


Controls whether return address is pushed on microstack 


SEQ.MUX 


12:11 


Selects source of next microaddress 


J 


10:0 


JUMP target address 


Table 9-2: Branch Format Control Field Definitions 


Name 


Bit(s) 


Description 


SEQ.FMT 


14 


1 for BRANCH 


SEQ.CALL 


13 


Controls whether return address is pushed on microstack 


SEQ.COND 


12:8 


Selects source of Microtest Bus 


BEANCH.OFFSET 


7:0 


Page offset of next microinstruction 



9.2.2.3 MIB Latches 

The microword output from the Control Store 8-to-l multiplexer is latched in $i into the Control 
Store Microsequencer Microinstruction Buffer (CS_MIB) latch. The microword output from the 
Patchable Control Store is also latched in 4>i, into the PCS_MIB latch. The outputs of the CS_ 
MIB and PCS_MIB latches drive a multiplexer, which selects the PCS_MIB output if the CAL hit 
in the Patchable Control Store; otherwise, the multiplexer selects the CSJMIB output. 

Bits <14:0> of the multiplexer output (the Microsequencer Microinstruction, E_USQ_CSM%UMIB_ 
H<14:0>) are driven back to the microsequencer; bits <60:14,12:11> are driven to the Microinstruction 
Buffer (MIB) latch. The MIB latch operates in <f>2, driving its outputs (E_USQ%MEB_H) to S3 of 
the Ebox. When a microtrap is detected, the contents of this latch are forced to NOP. The MIB 
latch is stalled on a microsequencer stall. 
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9.2.3 Next Address Logic 

The remainder of the microsequencer is devoted to determining the next control store lookup 
address. There are five next address sources: 

1. JUMP/BRANCH. OFFSET field of Microword 

2. Microtrap Logic 

3. Last Cycle Logic 

4. Microstack 

5. Test Address Generator 

9.2.3.1 CAL and CAL INPUT BUS 

The CAL, or Current Address Latch, is a static latch which holds the 11 bit address used to access 
the control store. It operates in #3, and is stalled on a microsequencer stall. Bits <10:8> are also 
"stalled" when forming a branch address. 

The input to the CAL is the CAL INPUT BUS. The CAL INPUT BUS is a dynamic bus, precharged in 
$2- The selected next address source drives this bus in #3. Bits < 14,12: 11> of the microsequencer 
control field are used in selecting three of the next address sources: E_US^CSM%UMDB_H<10:0> 
(for a BRANCH or JUMP address), the output of the last cycle logic, and the microstack out- 
put. The fourth CAL INPUT BUS source is the microtrap address; if a microtrap is detected, this 
input is selected regardless of the value of E_USQ_CSM%UMIB_H<14,12:11>. The fifth source is a 
test address, driven from the Test Address Generator. This input has the highest priority. In 
summary: 



Table 9-3: Current Address Selection 



TEST 
ADDR 



TRAP 

DETECTED 



SEQ.FMT 
<14> 



SEQ.MUX 
<12;11> 



NEXT ADDRESS 
SOURCE 



REMARKS 



0 
0 
0 

1 

X 



0 

1 

0 

0 
X 
X 



00 

XX 

01 

IX 
XX 
XX 



Branch Address 

Microstack 

Last Cycle Logic 
Microtrap Logic 



JUMP/CALL microin- 
structions 

BRANCH/CONDITIONAL 
CALL microinstructions 

RETURN microinstruc- 
tion 

Start new microflow 
Microtrap 



Test Address GeneratHJftst address 



9.2.3.1.1 Microtest Bus 

The microtest bus allows conditional branches and conditional calls based on Ebox information, 
such as condition codes. The SEQ.COND field of the BRANCH format is driven on the microtest 
select lines, E_USQ%UTSEL_H <4 : 0> , in #23- These lines are decoded by all conditional informa- 
tion sources the Ebox, and the selected source drives its information on the microtest bus, E_ 
BUS%UTEST_H<2:0>, in NOT #1. E_BUS%UTEST_H must be valid in time to be OR'd with value on 
the CAL INPUT BUS and latched in the CAL in #3. 
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The sources for the microtest bus are as follows: 



Table 9-4: Microtest Bus Sources 



UTSEL<4:0> Select 


UTEST<2:0> 


00 


No source 


000 


01 


ALU.NZV 2 


ALU_CC.NALU_CaZALU_CC.V 


02 


ALU.NZC 2 


ALU_CC.NALU_CC£ALILCC.C 


03 


B.2-0 1 


EBJBUS<2:0> 


04 


B.5-3 1 


EB_BUS<5:3> 


05 


A.7-5 1 


EA_BUS<7:5> 


06 


A.15-12 1 


EA_BUS<15:14>, EA_BUS<13> OR EA_BUS<12> 


07 


A31.BQA.BNZ1 1 


EAJBUS<31>, EB„BUS<2:0> = 0, EB_BUS<15:8> NEQ 0 


08 


MPU.0-6 2 


MPU0_6<2:0> 


09 


MPU.7-13 2 


MPU7„13<2:0> 


OA 


STATE.2-0 2 


STATE<2:0> 


0B 


STATE.5-3 2 


STATE<5:3> 


oc 


OPCODE.2-0 1 


OPCODE<2:0> 


0D 


PSL.26-24 S 


PSL<26:24> 


0E 


PSL.29.23-22 8 


PSL<29>, PSL<23:22> 


OF 


SHF.NZ 2 ,INT 


SHF.CC.N, SHF.CCZ, INTERRUPT_REQUEST 


10 


VECTOR,TEST 


ECR<VECTOR_UNIT_PRESENT> s , TEST DATA, TEST STROBE 


11 


FBOX 


Encoded fault<l:0>, ECR<FBOX.ENABLED> = 0 s 


12 


FQ.VR 1 


- 0, FIELD_QUEUE_NOT_VALID , FIELD_QUEUE_RMODE 


13- IF 


Not UBed 




*Data is 
2 Data is 
3 Data is 


taken from S3, 
taken from S4. 
taken from S6. 





The microtest select lines are always driven with bits <12:8> of the CAL regardless of the mi- 
croinstruction format. The microtest bus is only OR'd with the CAL INPUT BUS if the BRANCH 
source is selected to drive that bus. 

Two of the microtest sources, the Field Queue (FQ) and the Mask Processing Unit (MPU), perform 
some function based on the value of the microtest select lines. These functions must check 
SEQ.FMT, E_USQ%MIB_H<14>, for validity of the microtest select lines. 

The microtest select lines are precharged to a value of zero during no microtest source is 
selected for this value. 
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9.2.3.2 Microtrap Logic 

Microtraps allow the microcoder to deal with abnormal events that require immediate ser- 
vice. When a microtrap occurs, the microcode control is transferred to a service microroutine. 
'Operations further behind in the pipe than the one which caused the microtrap are aborted. 

Microtraps are generated by the Ebox, Mbox, or Ibox. Those Ebox microtrap requests considered 
faults are asserted in S4 of the microinstruction in which they occurred. Those that are considered 
traps are asserted in S5 of the microinstruction in which they occurred. 

Microtraps have higher priority than all other next address sources except the Test Address 
Generator. Microtraps are detected in 4> 4 . The microtrap signals are OR'd together in $i to form 
E_USQ%PE_ABORT_E. The trap signals are prioritized and address lookup is done to select the 
appropriate microtrap handler address, which is driven on the CAL INPUT BUS in $3. 

9.2.3.3 Last Cycle Logic 

The last cycle logic examines several conditions used to determine which new microflow is to be 
taken when LAST. CYCLE or LAST. CYCLE. OVERFLOW is detected on E_USQ_CSM%UMIB_H, no 
microtraps are detected, and no test address is driven. There are five possible new microflows, 
listed in order of priority: 

1. Interrupt Request Handler 

2. Trace Fault Handler 

3. First Part Done Handler 

4. Instruction Queue Stall 

5. The macroinstruction microcode indicated by the top entry in the instruction queue. 

The last cycle logic prioritizes these sources and performs address lookup. In addition, the signal 
E„USQ„LST%SELECT„IQ_H is derived. This signal is asserted when an entry is taken from the 
instruction queue. 



Table 9-5: 


Microaddresses for Last Cycle Interrupts or Exceptions 


Priority 


Interrupt or Exception 


Dispatch. Address (Hex) 


1 


Interrupt request 


24 


2 


Trace fault 


28 


3 


First part done 


2C 


4 


Instruction Queue Stall 


30 



The priorities in the last cycle logic are assigned using the following dependencies: 

1. Interrupts and trace faults must be handled between instructions. (Interrupts may also be 
serviced at defined points during long instructions such as string instructions; this servicing 
is handled by microcode.) 

2. By definition, an interrupt that is permitted to request service has a higher priority level 
(IPL) than any exception that occurs in the process to be interrupted, or any instruction to 
be executed by that process. 

3. When tracing is enabled (PSL<TP> is set), a trace fault must be taken before the execution 
of each instruction. 
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4. If an instruction begins execution with PSL<FPD> set, the first part done handler must be 
entered rather than the normal entry point for the instruction. 

5. PSL<TP> and PSL<FPD> cannot both be set when an instruction begins execution. In order 
for PSL<FPD> to be set, the instruction must have been interrupted previously", the interrupt 
handler always clears PSL<TP> before saving the PSL when interrupting an instruction. 
(Note that the interrupt handler does not clear PSL<TP> when the interrupt is taken between 
instructions.) 

6. The Instruction Queue Stall microword is executed if an opcode is requested from the 
Instruction Queue but the queue is empty. 

9.2.3.4 Microstack 

Frequently used microcode can be made into microsubroutines. When a microsubroutine is called, 
the return address is pushed onto the microstack. The output of the microstack is driven on the 
CAL INPUT BUS when a RETURN is decoded from the E_USQ_CSM%UMTB_E, no micro traps are 
detected, and no test address is driven. 

The microstack is 6 entries deep. It is a circular stack, with the write pointer always one entry 
ahead of the read pointer. Each entry is an 11-bit control store address. The addresses stored in 
the microstack incorporate any modification done by the microtest bus. 

9.2.4 Stall Logic 

The microsequencer is stalled whenever S3 is stalled. The Ebox derives the signal E_STL%USEQ_ 
STALL_H which is used to stall the microsequencer. The microsequencer creates delayed versions 
of this signal as needed to stall various latches. The signals E_USQ%PE_ABORT_H (asserted on 
initiation of a microtrap) and E_USQ_TST%FORCE_TEST_ADDR_E (asserted on detection of the Test 
Address Generator driving a control store microaddress, see Section 9.5) break a microsequencer 
stall by clearing the delayed versions of E_STL%USEQ_STALL_H . 

9.3 Initialization 

A reset (assertion of K_E%RESET_L) causes the microsequencer to initialize in the following state: 

• A powerup microtrap is initiated. 

• The microstack pointer is reset to zero. 

• The instruction queue is flushed and its pointers are reset by E_MSC%FLUSE_EBOX_H. 

9.4 Microcode Restrictions 

1. Every microtrap except Branch Mispredict must contain a RESET.CPU in order to reset the 
Instruction Queue. (The Ebox is flushed automatically, clearing the queues, on detection 
of branch mispredict.) RESET.CPU must not be issued within the 3 microwords preceding 
LAST. CYCLE in order to allow time for the Instruction Queue to be cleared (if RESET.CPU 
is present in microword N, LAST. CYCLE cannot be present until microword N+4). 

2. For correct operation of Trace Fault and First Part Done in the Last Cycle Logic, PSL<T,TP,FPD> 
must not be changed within the 2 microwords preceeding LAST. CYCLE (if any of these PSL 
bits are changed in microword N, LAST. CYCLE cannot be present until microword N+3). 
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3. No Ebox-initiated memory requests can be made in the last cycle of a microflow, other than 
writes with the translation already known to be valid. 

4. No Ebox-initiated memory requests can be outstanding when the microcode references an 
operand (queue entry or register file location). 

5. The instruction queue stall microword must indicate LAST. CYCLE. 

9.5 Testability 
9.5.1 Test Address 

The control store microaddress is both controllable and observable. A microcode address can be 
driven to the microsequencer from the Test Address Generator. The Test Address: Generator is an 
11-bit counter which is initialized to a value of zero on assertion of K_E%RESET„L. It increments 
its address counter once on each deassertion of T%CSJTEST_H, thus cycling through all possible 
control store addresses. 

This microaddress source takes priority over all others. To ensure immediate control store lookup 
using this microaddress, assertion of T%CS_TEST_H sets an S/R latch whose output is E_USQ_ 
TST%FORCE_TEST_ADDR_H. Assertion of this signal breaks any stall on $3, and #4 latches in 
the microsequencer. This allows the control store to operate, driving the selected; microword into 
the MIB scan chain (see Section 9.5.2). The Ebox stall(s), if any, are .unaffected, along with stalls 
on $1 latches in the microsequencer. 

E_USQ_TST%FORCE_TEST_ADDR_H is deasserted when the Test Address Generator has completed 
generation of all possible addresses. 

The microaddress driven from the CAL can be be observed on the Parallel Test Port data pins, 
along with the microsequencer stall signal, under control of the Parallel Test Port command pins. 
The microsequencer drives to the Parallel Test Port in $2- 

Figure 9-3: Parallel Port Output Format 



11 10 OS OS 10" 06 05 04 103 02 01 00 
CtH OUTPUT | 

I 

USE0_STAlL->— + 



Table 9-6: Parallel Port Output Format Field Definitions 

Name Bit(s) Description 

CAL OUTPUT 11:1 Microaddress driven from cal 

USEQ_STALL 0 Microsequencer stall, E_usQ_STL%vERy_i^rE_usQ_sTALi^H 
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9.5.2 MIB Scan Chain 

A 91-bit scan chain is present at the input to the MIB, allowing the complete micro word to be 
latched and scanned out of the chip. 

In addition, microcode patches are written into the patchable control store via the MIB scan 
chain. 



Table 9-7: 


Contents of MIB Scan Chain 






Extent 


Description 






<90:83> 


E_USQ%MIB„B< 7 :0> 




, 


<82:61> 


e_usq%mib_h<60:38> 






<60:50> 


CAM READ ADDRESS<10:0> 






<49:20> 


E.USQ%MIB_H< 37 :8> 






<19:0> 


CAM WRITE ADDRES8<19:0> 






Revision History 






Table 9-8: 


Revision History 






Rev 


Who 


When 


Description of change 


0.0 


Elizabeth M. Cooper 


06-Mar-1989 


Release for external review. 


0.1 


Elizabeth M. Cooper 


14-Sep-1989 


Post-modelling update. 


0.5 


Elizabeth M. Cooper 


10-Dec-1989 


Updates for Rev 0.5 spec release. 


0.5A 


Elizabeth M. Cooper 


5-Jan-1990 


Remove vector microtrap and V bit 








from IQ. 


0.5B 


Elizabeth M. Cooper 


20-Jun-1990 


Accumulated updates. 



Plus 0.1 Gil Wolrich 15-Nov-1990 Changes for NVAX Plus, retain block 

diagram and test features. 
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Chapter 10 

The Interrupt Section 



10.1 Overview 

NVAX Plus inputs six external interrupt signals as IRQ_H<3:0>, HALT_H, and ERRJE. These 
signals are hardwired, IRQ_H<3:0> and ERR_H are level sensitive, and the HALT_H is edge 
sensitive. The interrupts are non-vectored with the SCB Vector for each being predetermined. 
It is the responsibility of the interrupt software to determine the interrupt source and reset the 
interrupt. An explicit power fail interrupt is not implemented. 

Internal interrupts include INT.TIM.H, H.ERR.H, S^ERR.H, PERFORMANCE MONITOR 
FACILITY, and the architecturally defined Software Interrupt Requests. The full Interval Timer 
Implementation is present in the NVAX Plus chip, and thus no special considerations for the 
subset are necessary. 

The interrupt section receives interrupt requests from both internal and external sources, and 
compares the IPL associated with the interrupt request to the current interrupt level in the PSL. If 
the interrupt request is for an IPL that is higher than the current PSL IPL, the interrupt section 
signals an interrupt request to the microsequencer which will initiate a microcode interrupt 
handler at the next macroinstruction boundary. 

When an interrupt is serviced by the Ebox microcode, the interrupt section provides an encoded 
interrupt ID on E_BUS%ABUS, which allows the microcode to determine the highest priority in- 
terrupt request that is pending. Interrupt requests are cleared in one of two ways, depending on 
the type of request. 

Software interrupt requests are supported via a 15-bit SISR register, which is read and written 
by the microcode, and which makes requests to the interrupt generation logic. 

1 0.2 Interrupt Summary 

Interrupt requests received from external logic are synchronized to internal clocks. In addition, 
there are several internal sources of interrupt requests which are received by edge-sensitive logic. 
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10.2.1 External Interrupts 

HALT_H, ERR_H, and four external device interrupts are inpout to NVAX Plus. 



Interrupt 


Request IPL 


SCB Vector 


Bequest 


(Hex) 


(Dec) 


(Hex) 


HALT_E 


IF 


31 


CONSOLE 


KRR_H 


ID 


29 


60 


IRQ_H<3> 


17 


23 


DC 


XBQ_B<2> 


16 


22 


D8 




15 


21 


D4 


mq_B<0> 


14 


20 


DO 



10.2.1.1 HALT_H Interrupt Received by Edge-Sensitive Logic 

The low to high transition of HALT_E causes the CPU to enter the console code, through the 
address stored in the CHALT ipr register, at IPL IF (hex) at the next macroinstruction boundary. 
This interrupt is not gated by the current IPL, and always results in console entry, even if the 
IPL is already IF (hex). Note that the implementation of this event is different from a normal 
interrupt in which a PC/PSL pair are pushed on the interrupt stack. For this event, the current 
PC, PSL, and halt code are stored in the SAVPC and SAVPSL processor registers. Microcode 
clears the SR latch when the HALT interrupt is recognized by writing to the appropriate bit in 
the ISR. 

10.2.1.2 External Interrupt Requests Received by Level-Sensitive Logic 

Five external interrupt requests are received by level-sensitive logic and synchronized to internal 
clocks. These signals request general-purpose interrupts at the following IPLs. 

• ERR_H: The assertion of H_ERR_H indicates that a error has been detected in the system 
environment. This results in the dispatch of the interrupt to the operating system at IPL ID 
(hex) through SCB vector 60 (hex). 

• ntQ_H<3:0>: Device interrupts resulting in dispatch of the interrupt to the operating system 
at IPL 14-17 (hex) through SCB vector D0,D4,D8, or DC (hex). 

Each signal must be driven HIGH and remain HIGH to assert the interrupt request. Interrupt 
routines at the specified SCB acknowledge the interrupt. 

NOTE 

HALT.H is the EV IRQ_H<4> pin, and ERR_H is the EV IRQ_H<5> pin. 
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10.2.2 Internal Interrupt Requests 

The Cbox, Ibox, and Mbox report error conditions by asserting internal interrupt request signals. 
The H_err signal is ORed with ERRJE, while S_err inputs directly. H_err causes an interrupt to 
SCB 60CHEX), S_err causes an interrupt to SCB 54(HEX). 

The performance monitoring facility requests an interrupt at IPL IB (hex) when the performance 
counters. become half full. This request is serviced entirely by microcode, and cleared by writing 
to the appropriate bit in the ISR. 

The assertion of INT_TIM_H indicates that the interval timer period has expired and ICCS<6> 
is set. The interrupt is dispatched to the operating system at IPL 16 (hex.) through SCB vector 
CO (hex). 

Architecturally defined software interrupt requests are implemented through an internal register 
in the interrupt section. Under control of the SISR and SIRR processor registers which are 
described in Chapter 2, the Ebox microcode sets the appropriate bit in this register, which then 
results in the dispatch of the interrupt to the operating system at an IPL and through the SCB 
vector implied by the interrupt request. The association between the interrupt request, requested 
IPL, and SCB vector for these requests is shown in the following table. 





Request IPL 


SCB Vector 


SISR bit 


(Hex) 


(Dec) 


(Hex) 


SISR<15> 


OF 


15 


BC 


SISR<14> 


0E 


14 


B8 


SISR<13> 


0D 


13 


B4 


SISR<12> 


OC 


12 


B0 


SISR<11> 


OB 


11 


AC 


SISR<10> 


OA 


10 


A8 


SISR<09> 


09 


09 


A4 


SISR<08> 


08 


OS 


AO 


SISR<07> 


07 


07 


9C 


SISR<06> 


06 


06 


98 


SISR<05> 


05 


05 


94 


SISR<04> 


04 


04 


90 


SISR<03> 


03 


03 


8C 


SISR<02> 


02 


02 


88 


SISR<01> 


01 


01 


84 



Ebox microcode explicitly clears the interrupt request when the interrupt is serviced. 



1 0.2.3 Special Considerations for Interval Timer interrupts 

NVAX Plus does not implement the subset Interval Timer and does not require a copy of ICCS<6> 
at the Interrupt Section. 
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10.2.4 Priority of Interrupt Requests 

When multiple interrupt requests are pending, the interrupt section prioritizes the requests. 
Table 10-1 shows the relative priority (from highest to lowest) of all interrupt requests. For 
reference, this table also includes the IPL at which the interrupt is taken, and the SCB vector 
through which the interrupt is dispatched. 

Table 10-1 : Relative Interrupt Priority 



Interrupt 


Request IPL 


SCB Vector 


Request 


(Hex) 


(Dec) 


(Hex) 


HALTJE 


IF 


31 


None 1 


EKR.H 2 


ID 


29 


60 


Performance MonitorlB 


27 


58 s 


Facility 








S.ERR.L 2 


1A 


26 


54 


IRQ_H<3> 


17 


23 


DC 


IRQ_H<2> 


16 


22 


D8 


INT_TIM_L 


16 


22 


CO 


IBQ_H<1> 


15 


21 


D4 


DfoO_H<0> 


14 


20 


DO 


SISR<15> 


OF 


15 


BC 


SISR<14> 


OE 


14 


B8 


SISR<13> 


0D 


13 


B4 


SISR<12> 


OC 


12 


B0 


SISR<11> 


OB 


11 


AC 


SISR<10> 


OA 


10 


A8 


SISR<09> 


09 


09 


A4 


SISR<08> 


08 


08 


AO 


SISR<07> 


07 


07 


9C 


SISR<06> 


06 


06 


98 


SISR<05> 


05 


05 


94 


SISR<04> 


04 


04 


90 


SISR<03> 


03 


03 


8C 


SISR<02> 


02 


02 


88 


SISR<01> 


01 


01 


84 



Highest priority 



Lowest priority 



1 Direct dispatch to console; PC, PSL placed in SAVPC, SAVPSL processor registers 
2 Includes Cbox, Ibox, and Mbox internally generated requests 
3 Interrupt processed entirely by microcode 



The IRQJB<2> request takes priority over the INT_TIM_L request, both of which are at IPL 16 
(hex). 
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10.3 Interrupt Section Structure 

The interrupt section consists of three basic components: the synchroniation logic, the interrupt 
state register (ISR), and the interrupt generation logic. A block diagram of the interrupt section 
is shown in Figure 10-1. 

Figure 10-1 : Interrupt Section Block Diagram 



MHBC.L<l> f* *> 
RMRO_i.«S> I 

p*.mo.L«s> r— s 




PILE: tNTEHRUPT.SeCTION.eOC 



_0*0box_k_err_h 
_c%ok>x_«_err_k 

.l*IBO*_«_ERR_L 
_t_M»N»W10S.L 

1 



INTERRUPT 



PRIORITY 
ENCODER 



I I.In't.dpiuipl. 



2 



E%IW REO 



t;eu«v*Bur l«so:'i£)..int.id««io» 

i"BU»%ABUO-«&»"">C»«»> 



COtlRARITOR 



t..iNT_opi%ice»_«..i 

E.IN%pP1*ilBR.L«1il>> 
E BUffcABUS L 



10.3.1 Synchronization Logic 

The pads for the SIX external interrupt request signals contain synchronizers to allow the use 
of asynchronous signals for interrupt requests. The synchronized signals are then passed to the 
ISR. 
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10.3.2 Interrupt State Register 

The interrupt state register is a composite register that implements the 15-bit architecturally 
denned SISR register, the interrupt latch for the performance monitoring facility interrupt, in- 
ternal S_err, and the interrupt request latches for the six external interrupts. The ISR contains 
two kinds of elements: SR flops for the internal interrupt requests, and latches for the external 
and software request interrupts. The following table lists the types and positions of all elements 
in the ISR. 



State 



ISR bit 


Element 


Description 


31 


SR 


Interrupt request for halt.h interrupt 


29 


L 


Interrupt request for err.h and internal C%CBOX H ERR from BIU 






STAT 


28 


SR 


Interrupt request for performance monitoring facility interrupt 


27 


SR 


Interrupt request for SJERR_L /internal soft error interrupts 


26 


L 


Interrupt request for irq_h<3> interrupt 


25 


L 


Interrupt request for moja<2> interrupt 


24 


SR 


Interrupt request for INT_TIM_L interrupt 


23 


L 


Interrupt request for ibq_h<1> interrupt 


22 


L 


Interrupt request for ir<lh<0> interrupt 


15:1 


L 


SISR<15:1> latcheB and requests for software interrupts 


State Element 



SB— SR flop 
L— Latch 



P>The HALT_Hinterrupt request is loaded into the request flop in ISR<31>. The request is cleared 
by under Ebox microcode control when written with a 1 from E%WBUS. 

Internal requests from the Cbox, Ibox, and Mb ox cause the assertion of one of these signals causes 
the appropriate request flop to be set in ISR<27,24>. These request flops are cleared under Ebox 
microcode control .when written with a 1 from E%WBUS. 

The performance monitoring faciltiy interrupt request is loaded into the request flop in ISR<28>. 
The request is cleared by under Ebox microcode control when written with a 1 from E%WBUS. 

SISR<15:1> is implemented via ISR<15:1>, and is loaded from bits <15:1> of E%WBUS under Ebox 
microcode control. These request latches are cleared under Ebox microcode control when a new 
value is loaded from E%WBUS. 

The interval timer request from ISR<24> is not gated with ISR<0> as only a single version of 
ICCS<6> exits for NVAX Plus. NVAX Plus does not implement ISR<0>. (ISR<31:22,15:1>) go to 
the interrupt generation logic. ISR<15:1> may also be read onto E_BUS%ABUS for return to the 
Ebox. 
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10.3.3 Interrupt Generation Logic 

The interrupt generation logic priority encodes all interrupt requests from the interrupt state 
register to determine the highest priority request. The output of the encoder is the request IPL 
and the interrupt ID of the highest priority request. If any request is pending, the request IPL is 
compared against E%PSL<20:16> from the Ebox. If the request IPL is higher than the PSL IPL, 
or if the request is for HALT.H (HALT_H is not gated by the IPL), E%INT_REQ is asserted to the 
microsequencer. 

The assertion of E%ENT_REQ causes the microsequencer to initiate a microcode interrupt handler 
at the next macroinstruction boundary. The same signal is available on the microtest bus as a 
microbranch condition, which is checked by the Ebox microcode during long instructions. 

Along with the request IPL, the interrupt generation logic provides an encoded interrupt ID that 
identifies the highest priority interrupt. The interrupt ID is read onto E„BUS%ABUS along with 
ISR<15:1> when microcode references the A/INT.SYS source. For each interrupt, the interrupt 
ID encoding, request IPL, ISR bit number, method for clearing the interrupt, and SCB vector is 
shown in Table 10-2. 



Table 1 0-2; Summary of Interrupts 



Interrupt 


hat ID 


Request IPL 


ISR Bit 


Reset 


SCB Vector 


Request 


(Hex) 


CDec) 


(Hex) 


(Dec) 


(Dec) 


Method 


(Hex) 


HALTJH 


IF 


31 


IF 


31 


31 


Write 1 to ISR bit 


Console Halt 


ERR..H 1 


ID 


29 


ID 


29 


29 


BY HJ2RR HANDLER 


60 


E_PMN%PMON_L 


IB 


27 


IB 


27 


28 2 


Write 1 to ISR bit 


58 Handled 
by microcode 


S D _ERR_L^ 


1A 


26 


1A 


26 


27 2 - 


Write 1 to ISR bit 


54 


irqjb<3> 


17 


23 


17 


23 


26 


BY INTERRUPT RTN 


DC 


IRQ_H<2> 


16 


22 


16 


22 


25 


BY INTERRUPT RTN 


D8 


INT_TIM_L 


1C 8 


28 


16 


22 


24 2 


Write 1 to ISR bit 


CO 


ikq_h<1> 


15 


21 


15 


21 


23 


BY INTERRUPT RTN 


D4 


irq_h<0> 


14 


20 


14 


20 


22 


BY INTERRUPT RTN 


DO 


SISR<15> 


OF 


15 


OF 


15 


15 


Write 0 to ISR bit 


BC 


SISR<14> 


OE 


14 


0E 


14 


14 


Write 0 to ISR bit 


B8 


SISR<13> 


OD 


13 


0D 


13 


13 


Write 0 to ISR bit 


B4 


SISR<12> 


OC 


12 


OC 


12 


12 


Write 0 to ISR bit 


B0 


SISR<11> 


OB 


11 


0B 


11 


11 


Write 0 to ISR bit 


AC 


SISR<10> 


OA 


10 


OA 


10 


10 


Write 0 to ISR bit 


A8 


SISR<09> 


09 


09 


09 


09 


09 


Write 0 to ISR bit 


A4 



1 Includes Cbox, Ibox., and Mbox internally generated requests 
2 Write-l-to-clear ISR bit is different tlian IPL and interrupt ID 
3 Interrupt ID is different than IPL 
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Table 10-2 (Cont.): Summary of Interrupts 



Interrupt 


Int ID 


Request IPL 


ISRBit 


Reset 


SCB Vector 


Request 


(Hex) 


(Dec) 


(Hex) 


(Dec) 


(Dec) 


Method 


(Hex) 


SISR<08> 


08 


08 


08 


08 


08 


Write 0 to ISR bit 


AO 


SISR<07> 


07 


07 


07 


07 


07 


Write 0 to ISR bit 


9C 


SISR<06> 


06 


06 


06 


06 


06 


Write 0 to ISR bit 


98 


SISR<05> 


05 


05 


05 


05 


05 


Write 0 to ISR bit 


94 


SISR<04> 


04 


04 


04 


04 


04 


Write 0 to ISR bit 


90 


SISR<03> 


03 


03 


03 


03 


03 


Write 0 to ISR bit 


8C 


SISR<02> 


02 


02 


02 


02 


02 


Write 0 to ISR bit 


88 


SISR<01> 


01 


01 . 


01 


01 


01 


Write 0 to ISR bit 


84 


No Interrupt 


00 


00 








Dismiss interrupt 





The interrupt ID is the same as the request IPL for all interrupt requests except for the interval 
timer request. 

DESIGN CONSTRAINT 

A value of zero for the interrupt ID must be returned if an interrupt is no longer 
present, or if the highest priority interrupt request is no longer higher than the PSL 
IPL. Normally, once an interrupt request is made, it remains until it is cleared by the 
microcode. However, the level-sensitive interrupt requests may be deasserted after the 
interrupt is dispatched, but before the microcode reads the interrupt ID. Therefore, it is 
possible that the highest remaining interrupt has a request IPL lower than the current 
PSL IPL. If zero is not returned for the interrupt ID in this instance, the processor will 
not function correctly. 

10.4 Ebox Microcode interface 

The Ebox microcode interfaces with the interrupt section primarily through reads (via E_ 
BUS%ABUS) and writes (via E%WBUS) of the ISR accomplished through the A/INT.SYS and 
DST/INT.SYS decodes. These decodes provide access to the so-called INT.SYS register, which 
is shown in Figure 10—2. The fields of the register are listed in Table 10-3. 
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Figure 10-2: INT.SYS Register Format 



30 2i 2812" 26 25 24|23 22 21 20|1S 18 17 16115 14 IS 12 111 10 0& 08 107 06 05 04 1 02 02 01 00 
01 01 I | 01 01 | 0i 0| 0| INT. ID I £IER<15:1> I I 



I +-- INT_TIM_R£SET 

I 

+--S_ERR_RESET 
PMON RESET 



HALT RESET 
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Table 10-3: INT.SYS Register Fields 



Name 


Bit(s) 


Type 


Description 


SISR 


15:1 


RWO 


This field contains the 15 architecturally-defined software interrupt 
request bits. It is set to 0 by microcode at powerup. 


INT.ID 


20:16 


RO 


This field contains the encoding of the highest priority interrupt 
request as listed in Table 10-2. Writes to this field are ignored. 


Irv 1 _ I IM_xCJtib iL 1 


2A 


1X7 O t\ 


Writing a 1 to this field clears the int_tim_l interrupt request. 
Writing a 0 has no effect on the request. The field is read as a 0 
and the interrupt request is cleared by microcode at powerup. 


S_ERR_RESET 


27 


wao 


Writing a 1 to this field clears the s_err_l interrupt request. Writing 
a 0 has no effect on the request. The field is read as a 0 and the 
interrupt request is cleared by microcode at powerup. 


PMON.RESET 


28 


wc,o 


Writing a 1 to this field clears the e_pmn%pmon,_l interrupt request. 
Writing a 0 has no effect on the request. The field is read as a 0 and 
the interrupt request is cleared by microcode at powerup. 


HALT.RESET 


31 


WC,0 


Writing a 1 to this field clears the halt.h interrupt request. Writing 
a 0 has no effect on the request. The field is read as a 0 and the 
interrupt request is cleared by microcode at powerup. 



DESIGN CONSTRAINT 

When read onto E_BUS%ABUS, INT.SYS<31,28,27,24> must be zero. Microcode updates 
the internal copy of SISR<15:1> by reading the INT.SYS register,modifying the appro- 
priate bits, and writing the updated value back. The write-one- to-clear bits must be 
read as zero because the microcode does not mask them out before writing them back. 

MICROCODE RESTRICTION 

The INT.SYS register is not bypassed. A write to INT.SYS in microinstruction n must 
not be followed by a read of INT.SYS sooner than microinstruction 

MICROCODE RESTRICTION 

Changes to machine state that affect the generation of interrupts (PSL<IPL>, or 
SISR<15:1>) done by microinstruction n must not be followed by a LAST CYCLE mi- 
croinstruction sooner than microinstruction if the change is to be observed by the 
next macroinstruction. 

10.5 Processor Register Interface 

Software can interact with the interrupt section hardware and microcode via references to pro- 
cessor registers, as follows: 

• SISR, SIRR: References to the arcmtecturally-defined SISR and SIRR processor registers 
allow access to SISR<15:1>, which are implemented in INT.SYS<15:1>. 
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• INTSYS: References to the INTSYS processor register allow diagnostic and test software 
direct access to the INT.SYS register. Reads of the INTSYS processor register return the 
format shown in Figure 10-2. Writes of the INTSYS processor register are internally masked 
by microcode such that only the left half write-to-clear bits are written. Other bits remain 
unchanged. Writes to the INTSYS processor during normal system operation can result in 
UNDEFINED behavior. 

10.6 Interrupt Section Interfaces 

10.6.1 Ebox Interface 

10.6.1.1 Signals From Ebox 

• E%PSL<20:16>: IPL field from the current PSL. 

• E%WBUS: Write data bus, from which SISR<15:1> are loaded, and from which the write-one- 
to-clear interrupt latches are cleared. 

• E_PMN%PMONJL: Performance monitoring facility interrupt request. 

10.6.1.2 Signals To Ebox 

• E_BUS%ABUS: A-port operand bus, on which SISR<15:1> and the interrupt ID are returned. 

1 0.6.2 Microsequencer Interface 

1 0.6.2.1 Signals from Microsequencer 

• E_USQ_CSM%UTSEL: Microtest bus select code. 

10.6.2.2 Signals To Microsequencer 

• E%INT_REQ: Interrupt penciling. 

• E_BUS%UTEST: Microtest bus. 

1 0.6.3 Cbox Interface 
10.6.3.1 Signals From Cbox 

• C%CBOX_H_ERR: Hard error interrupt request. 

• C%CBOX_S_ERR: Soft error interrupt request. 

• E\T„TIM_L: Interval timer interrupt signal. 

10.6.4 Ibox Interface 
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10.6.4.1 Signals From Ibox 

• I%mox_S_ERR: Soft error interrupt request. 

10.6.5 Mbox Interface 
10.6.5.1 Signals From Mbox 

• M%MBOX_S_ERROR: Soft error interrupt request. 

10.6.6 Pin Interface 
10.6.6.1 Input Pins 

• HALT_H: Halt interrupt signal 

• ERR_H: Error interrupt signal 

• IRQ_H<3:0>: General -purpose interrupt signals 

10.7 Revision History 



Table 10-4: 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


06-Mar-1989 


Release for external review. 


Mike Uhler 


14-Dec-1989 


Update for second-pass release. 


Ron Preston 


09-Jan-1990 


Changes to simplify implementation. 


Mike Uhler 


20-Jul-1990 


Update for change to performance monitoring interrupt request and 






reflect implementation. 


Gil Wolrich 


15-Nov-1990 


NVAX Plus modifications 


Gil Wolrich 


l-Aug-1991 


update 
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Chapter 11 
The Fbox 



11.1 Overview 

This chapter provides a high level description of the floating point unit of the NVAX Plus 
CPU chip. For complete specification of the FBOX . refer to the NVAX CPU Chip Functional 
Specification. 

11.2 Introduction 

The Fbox is the floating point unit in the NVAX Plus CPU chip. The Fbox is a 4 stage pipelined 
floating point processor, with an additional stage devoted to assisting division. It interacts with 
three different segments of the main CPU pipeline, these are the micro-sequencer in S2 and the 
Ebox in S3 and S4. The Fbox runs semi-autonomously to the rest of the CPU chip and supports 
the following operations: 

• VAX Floating Point Instructions and Data Types 

The Fbox provides instruction and data support for VAX floating point instructions. VAX F-, 
D-, and G-floating point data types are supported. 

• VAX Integer Instructions 

The Fbox implements lon|»word integer multiply instructions. 

• Pipelined Operation 

Except for all the divide instructions, DIV{F,D,G}, the Fbox can start a new single precision 
floating point instruction every cycle and a double precision floating point or an integer mul- 
tiply instruction every two cycles. The Ebox can supply two 32-bit operands or one 64-bit 
operand to the Fbox every cycle on two 32 bit input operand buses. The Fbox drives the 
result operand to the Ebox on a 32-bit result bus. 

• Conditional 'Mini-Round" Operation 

Result latency is conditionally reduced by one cycle for the most frequently used instructions. 
Stage 3 can perform a "mini-round" operation on the LSB's of the fraction for all ADD, SUB, 
and MUL floating instructions. If the "mini-round" operation does not fail, then stage 3 drives 
the result directly to the output, bypassing stage 4 and saving a cycle of latency. 

• Fault and Exception Handling 

The Ebox coordinates the fault and exception handling with the Fbox. Any fault or exception 
condition received from the Ebox is retired in the proper order. If the Fbox receives or 
generates any fault or exception condition, it does not change the flow of instructions in 
progress within the Fbox pipe. 



DIGITAL CONFIDENTIAL 



The Fbox 11-1 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 



Figure 11-1 is a top level block diagram of the Fbox showing the six major functional blocks 
within the Fbox and their interconnections. 



Figure 11-1: Fbox block diagram 
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11.3 Fbox Functional Overview 

The Fbox is the floating point accelerator for the NVAX CPU. Its instruction repertoire includes 
all VAX base group floating point instructions. The data types that are supported are F, D, and 
G. Additional integer instructions that are supported are MULL2, and MULL3. 

The number of internal execution cycles and the total number of cycles to complete an instruction 
within the Fbox is measured as follows in Figure 11—2 



11-2 The Fbox 
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Figure 11-2: Fbox Execute Cycle Diagram 
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The internal execution time for all instructions except MUL{D,G,L} and DTV{F,D,G} is four cycles. 

The internal execution time of the various Fbox operations is given in the following Table 11—1. 



Table 11-1: Fbox Internal Execute Cycles 

INSTRUCTION F D G L 

MUL 4 5 5 5 

DTV 14 25 24 

ALL OTHER 4 4 4 4 



The total number of cycles taken by the Fbox to complete an instruction is given in Table 11—2. 
Note that this includes the cycles taken for opcode and operand transfer, in particular, the dead 
cycle between the opcode and the first operand is counted. 



Table 11-2: List of the Fbox Total Execute 9 Cycles 


INSTRUCTION F 


D 


G 


L 


MUL 7 


10 


10 


8 


DIV 17 


30 


29 




ALL OTHER 7 


9 


9 





11.3.1 Fbox interface 

This section is responsible for overseeing the protocol with the Ebox. This includes the sequence 
of receiving the opcode, operands, exceptions, and other control information, and also outputing 
the result with its accompanying status. The opcode and operands are transferred from the input 
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interface to stage 1 in all operations except division. The result is conditionally received from 
either stage 3 or stage 4. 

11.3.2 Divider 

The divider receives its inputs from the interface and drives its outputs to stage 1. It is used 
only to assist the divide operation, for which it computes the quotient and the remainder in a 
redundant format. 

11.3.3 Stage 1 

Stage 1 receives its inputs from either the interface or the divider section and drives its outputs 
to stage 2. It is primarily used for determining the difference between the exponents of the two 
operands, subtracting the fraction fields, performing the recoding of the multiplier and forming 
three times the multiplicand, and selecting the inputs to the first two rows of the multiplier array. 

11.3.4 Stage 2 

Stage 2 receives its inputs from stage 1 and drives its outputs to stage 3. Its primary uses are: 
right shifting (alignment), multiplying the fraction fields of the operands, and zero and leading 
one detection of the intermediate fraction results. 

11.3.5 Stage 3 

Stage 3 receives most of its inputs from stage 2 and drives its outputs to stage 4 or, conditionally, 
to the output. Its primary uses are: left shifting (normalization), and adding the fraction fields 
for the aligned operands or the redundant multiply array outputs. This stage can also perform a 
"mini-round" operation on the LSB's of the fraction for ADD, SUB, and MUL floating instructions. 
If the "mini-round" does not overflow, and if there are no possible exceptions, then stage 3 drives 
the result directly to the output, bypassing stage 4 and saving a cycle of latency. 

11.3.6 Stage 4 

Stage 4 receives its inputs from stage 3 and drives its outputs to the interface section. It is used 
for performing the terminal operations of the instruction such as rounding, exception detection 
(overflow, underflow, etc.), and determining the condition codes. 

11.3.7 Fbox Instruction Set 

The instructions listed in Table 11—3 constitute the VAX integer and floating point instructions 
supported by the Fbox datapath. 



11-4 The Fbox 



DIGITAL CONFIDENTIAL 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 



Table 11-3: Fbox Floating Point and Integer Instructions 
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Table 11-3 (Cont.): Fbox Floating Point and Integer instructions 

CC 

FboxOpc Instruction NZVC MAP DL Exceptions 



042 SUBP2 sub.rf, dif.mf **00 10 10 rsv, fov, fuv 

043 SUBF3 sub.rf, min.rf, dif.wf **00 10 10 rsv, fov, fuv 

062 SUBD2 sub.rd, dif.md **00 10 11 rsv, fov, fuv 

063 SUBD3 sub.rd, min.rd, dif.wd **00 10 11 rsv, fov, fuv 

142 SUBG2 sub.rg, dif.mg **00 10 11 rsv, fov, fuv 

143 SUBG3 sub.rg, min.rg, dif.wg **00 10 11 rsv, fov, fuv 



0C4 MULL2 mulr.rL prod.ml ***0 11 10 iov 

0C5 MULL3 mulr.rl, muld.rl, prod.wl ***0 11 10 iov 

044 MULF2 mulr.rf, prod.mf **00 10 10 rsv, fov, fuv 

045 MULF3 mulr.rf, muld.rf, prod.wf **00 10 10 rsv, fov, fuv 

064 MULD2 mulr.rd, prod.md **00 10 11 rsv, fov, fuv 

065 MULD3 mulr.rd, muld.rd, prod.wd **00 10 11 rsv, fov, fuv 

144 MULG2 mulr.rg, prod.mg **00 10 11 rsv, fov, fuv 

145 MULG3 mulr.rg, muld.rg, prod.wg **00 10 11 rsv, fov, fuv 



046 DIVF2 divr.rf, quo.mf 

047 DTVF3 divr.rf, divd.rf, quo.wf 

066 DIVD2 divr.rd, quo.md 

067 DIVD3 divr.rd, divd.rd, quo.wd 

146 DIVG2 divr.rg, quo.mg 

147 DTVG3 divr.rg, divcLrg, quo.wg 



050 MOVF srcrf, dst.wf 

070 MOVD src.rd, dst.wd 

150 MOVG src.rg, dst.wg 

052 MNEGF srcrf, dst.wf 

072 MNEGD srcrd, dstwd 

152 MNEGG srcrg, dst.wg 



051 CMPF srcl.rf, src2.rf 
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Table 11-3 (Cont.): Fbox Floating Point and Integer Instructions 



Fbox Opc 


Instruction 


NZVC 


cc 

MAP 


DL 


Exceptions 
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CCJMLAP: Condition Code Map 

00 = No Update 

01 = MOV Floating 

10 = All Other Floating 

11 s Integer 



DL: Result Data Length 

00 = Byte 

01 m Word 

10 = Long 

11 * Quad 



11.3.8 Revision History 



Table 11-4: 


Revision History 




Who 


When 


Description of change 


Anil. Jain 


17-Mar-1989 


Initial Release 


Anil Jain 


18-Dec-1989 


Updated to reflect the Fbox implementation 


Gil Wolrich 


15-Nov-1990 


Retain FBOX overview for NVAX Plus Spec 
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Chapter 12 
The Mbox 



12.1 INTRODUCTION 

This chapter contains the high level description of the NVAX Plus MBOX, and specifies the 
changes with respect to PCache Invalidates and external map support. It also includes EBOX 
and CBOX interface descriptions, IPR specifications, and testability features from the NVAX CPU 
Chip Functional Specification. Refer to NVAX CPU Chip Functional Specification for the detailed 
decription of the MBOX 

The Mbox performs three primary functions: 

• VAX memory management: The Mbox, in conjunction with the operating system memory 
management software, is responsible for the allocation and use of physical memory. The 
Mbox performs the hardware functions necessary to implement VAX memory, management. 
It performs translations of virtual addresses to physical addresses, access violation checks 
on all memory references, and initiates the invocation of software memory management code 
when necessary. 

• Reference processing: Due to the macropipeline structure of NVAX Plus, and the coupling 
between NVAX Plus and its memory subsystem, the Mbox can receive memory references 
from the Ibox, Ebox and Cbox(invalidates) simultaneously. Thus, the Mbox is responsible 
for prioritizing, sequencing, and processing all references in an efficient and logically correct 
fashion and for transferring references and their corresponding data to/from the Ibox, Ebox, 
Pcache, and Cbox. 

• Primary Cache Control: The Mbox maintains an 8KB physical address cache of I-stream and 
D-stream data. This cache, called the Pcache (Primary Cache), exists in order to provide a 
two cycle pipeline latency for most I-stream and D-stream data requests. It is the fastest 
D-stream storage medium for NVAX Plus and represents the first level of D-stream memory 
hierarchy and the second level of I-stream memory hierarchy for the NVAX Plus scalar data. 
The Mbox is responsible for controlling Pcache operation. 
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12.2 MBOX STRUCTURE 

This section presents a block diagram of the Mb ox and defines the function of the basic Mbox 
components. 

The following block diagram illustrates the basic components of the Mbox. 



12-2 The Mbox 
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Figure 12-1 : Mbox Block Diagram 
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The Mbox is implemented as a two-stage pipeline located in the fifth and sixth segments of the 
NVAX Plus macropipeline (S5 and S6). References processed by the Mbox are first executed in 
S5. Upon successful completion in S5, the reference is transferred into S6. At this point, the 
reference has either completed or is transferred to the Ibox, Ebox, or Cbox. 

During any cycle, the fundamental state of the S5 and S6 stages can be defined by the particular 
references which currently reside in these two stages. For the purposes of describing the Mbox, 
all references can be viewed as a packet of information which is transferred on the S5 and S6 
buses. The S5 reference packet, and the corresponding S5 buses are defined as: 

• ADDRESS: The M_QUE%S5_VA<31:0> bus transfers all virtual addresses and some physical 
addresses into the S5 pipe. The M_QUE%S5JPA<31:0> bus transfers some physical addresses 
into the S5 pipe and transfers all addresses out of the S5 pipe. 

• DATA: M_QUE%S5_DAIA<31:0> transfers data originating from the Ebox, through the S5 pipe. 

• COMMAND: M_QUE%S5_CMD<4:0> transfers the type of reference through the S5 pipe. This 
command field is defined in Section 12.3.1. 

• TAG: The M_ QUE%S5_TAG<4 : 0> transfers the Ebox register file destination address corre- 
sponding to the reference through the S5 pipe. 

• DEST_BOX: M_QUE%S5_DEST<1:0> transfers the reference destination information through 
the S5 pipe. This field is defined as follows: 



M_QUE%S5_DEST Definition 

00: the reference requests data destined for the Mbox. 

01: the reference requests data destined for the Ibox 

10: the reference requests data destined for the Ebox 

11: the reference requests data destined for the Ebox and Ibox 



• AT: The M_QUE%S5_ AX< 1 : 0 > transfers the access type of the reference. This field is defined as 
follows: 



M_QUE%S5_AT Definition 

00: tb passive query access (See PROBE command) 

01: read access 

10: write access 

11: modify access (read with write check for future write to same addr) 

• DL: The M_QUE%S5_DL<1:0> transfers the data length of the reference. This field is defined 
as follows: 



M_QUE%S5_DL Definition 



01: 



00: 



10: 



byte 

word 

longword 
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M_QUE%S5_DL Definition 



11: quad word 



• BYTE.MASK: The M_QUE%S5_BM<7:0> transfers the byte mask information out of the S5 
pipe. 

• REF_QUAL: The M_QUE%S5_QUAJL<6:0> transfers information which further qualifies the ref- 
erence for the purpose of Mbox processing. This field is defined as follows: 



M_QUE%S5„QUAL bit Definition 



m_que%s6_qual<6> address of reference is currently a virtual address. 

m_que%s5_qual<5 > reference has been tested for cross-page condition. 

m_que%s6_qual<4> reference is first part of an unaligned reference. 

m_que%s5_qual< 3 > reference is second part of an unaligned reference. 

m_que%S6_quak2> enable ACV and M=0 checks. 

m_qxje%sb_qual<1> reference has or is forced to have a hard error. 

m_que%s5_qual< 0 > reference has or is forced to have a memory management fault (ACV /TNV/M=0 ). 

The S6 reference packet, and the corresponding S6 buses are defined as: 

• ADDRESS: The M%S6_PA<31:0> bus transfers a physical address through the S6 pipe. 

• DATA: B%S6JDA3A<63:0> transfers data through the S6 pipe. 

• COMMAND: M%S6_CMT><4:0> transfers the type of reference through the S6 pipe. This com- 
mand field is defined in Section 12.3.1. 

• TAG: The M_QUE%S6_TAG<4 : 0 > transfers the Ebox register file destination address corre- 
sponding to the reference through the S6 pipe. 

• DEST_BOX: m_QUE9©S6_D»EST<1:0> transfers the reference destination information through 
the S6 pipe. This field is defined as follows: 



M_QUE%S6_DEST Definition 



00 
01 
10 
11 



the reference requests data destined for the Mbox. 

the reference requests data destined for the Ibox. 

the reference requests data destined for the Ebox. 

the reference requests data destined for the Ebox and Ibox. 



S6_BYTE_MASK M%S6_BYTE_MASK<7:0> transfers the byte mask information through the 
S6 pipe. The byte mask field is used to indicate which bytes of a longword or quadword write 
should actually be written to a cache or memory. 

REF_QUAL: M„QUE%S6_QUAL<3 :0> transfers information which further qualifies the refer- 
ence for the purpose of Mbox processing. This field is defined as follows: 
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M_QUE%S6_QUAL bit 



Definition 



M_QUE%S6_QUAL<3 > 
M_QUE%S6_QUAL<2> 
M_QUE%S6_QUAL< 1> 
M_QUE%S6.QUAL< 0 > 



reference is first part of an unaligned reference, 
reference is second part of an unaligned reference, 
reference has or is forced to have a hard error. 

reference has or is forced to have a memory management fault ( AC V/TNV7M=0 ) . 



12.2.1 EM_LATCH 

The EM_LATCH latches and stores all commands originating from the Ebox. Each reference is 
stored until the following two conditions are satisfied: 1) the "complete logical reference" (i.e. 
the pair of aligned references required if the EM_LATCH reference is unaligned) clear memory 
management access checks, and 2) the EM_LATCH reference successfully completes in S5. 

A 4-way byte barrel shifter is connected to the data portion of the EM_LATCH. This enables the 
write data to be byte-rotated into longword alignment. The EM_LATCH output can be tristated. 

12.2.2 CBOX__LATCH 

The CBOX_LATCH stores references originating from the Cbox. These references are I -stream 
Pcache fills, D-stream Pcache fills, or Pcache hexaword invalidates. Each reference is stored until 
the reference successfully completes in S5. 

Note that no data field is present in this latch even though this latch services cache fill commands. 

Cache fill data will be supplied to the Pcache on the B%S6_DATA Bus by the Cbox during the 
appropriate S6 cache fill cycle. The. C%CBOX_ADDR bus is driven by the Cbox during invalidate 
commands. During cache fill commands, all but two bits of the C%CBOX_ADDR bus are driven by 
the DMISS_LATCH or IMISS.LATCH. The Cbox will drive C%MBOX_FILL_Q W<4 : 3 > during cache 
fill commands in order to supply the quadword alignment of the fill data within the hexaword 
block. The CBOX.LATCH output can be tristated. 

12.2.3 TB 

The TB (translation buffer) is the mechanism by which the Mbox performs quick virtual-to- 
physical address translations. It is a 96-entry fully associative cache of PTEs (Page Table Entries). 
Bits 31 through 9 of all S5 virtual addresses act as the TB tag. The replacement algorithm 
implemented is Not-Last-Used. 

12.2.4 DMISSJ.ATCH and IMISSJ-ATCH 

The DMISS_LATCH stores the currently outstanding D-stream read. That is, a D-stream read, 
which missed in the Pcache, is stored in the DMISS_LATCH until the corrsponding Pcache block 
fill operation completes. The DMISS_LATCH also stores IPR_RDs to be processed by the Cbox 
until the Cbox supplies the data. I-stream reads are handled analogously by the IMISS_LATCH 
except that IPR_RDs are never handled by the IMISS_LATCH. 
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These two latches have comparators built in in order to detect the following conditions: 

• For NVAX If the hexaword address of an invalidate matches the hexaword address stored in 
either MISS_LATCH, the corresponding MISS_LATCH sets a bit to indicate that the corre- 
sponding fill operation is no longer cacheable in the Pcache. **NVAX Plus invalidates only 
specify index<12:5>, and the PCache set to be invalidated. If the index and MISS_LATCH 
allocation bit match an invalidate the the corresponding MISS_LATCH sets a bit to indicate 
that the corresponding fill operation is no longer cacheable in the Pcache.** 

• Address<ll:5> addresses a particular Pcache index (corresponding to two Pcache blocks). If 
address<8:5> of the DMISSJLATCH matches the corresponding bits of the physical address 
of an S5 I-stream read, the S5 I-stream read is stalled until the entire D-stream fill operation 
completes. This prevents the possibility of causing a D-stream fill sequence to a given Pcache 
block from simultaneously happening with an I-stream fill sequence to the same Pcache block. 

• By the same argument, adclress<8:5> of the IMISSJLATCH is compared against S5 D-stream 
reads to prevent another simultaneous I-stream/D-stream fill sequence to the same Pcache 
block. 

• Address<8:5> of both miss_latches is compared against any S5 memory write operation. This 
is necessary to prevent the write from interfering with the cache fill sequence, 



The Pcache is a two-way set associative, read allocate, no-write allocate, write through, physical 
address cache of I-stream and D-stream data. Some systems may force the Pcache to allocate 
such that if address [123=0 set 0 is loaded, and if address[12]=l set 1 is loaded, using the Pcache 
as if it were direct mapped so that the Pcache can be backmapped exactly as the EV4 Dcache. 
The Pcache stores 8192 bytes (8K) of data and 256 tags corresponding to 256 hexaword blocks 
(1 hexaword = 32 bytes). Each tag is 20 bits wide corresponding to bits <31:12> of the physical 
address. There are four quad word subblocks per block with a valid bit associated with each 
subblock. The access size for both Pcache reads and writes is one quadword. Byte parity is 
maintained for each byte of da ta (32 bits per block). One bit of parity is maintained for every 
tag. The Pcache has a one cycle access and a one cycle repetition rate for both reads and writes 
(note however, that the entire Mbox latency is two cycles due to the two stage Mbox pipeline). 



This section discusses how references are processed by the Mbox, and how the Mbox functional 
components interact to carry out reference processing. 



12.2.5 Pcache 



12.3 REFERENCE PROCESSING 



12.3.1 REFERENCE DEFINITIONS 



The following table describes all types of references processed by the Mbox: 



Table 12-1: Reference Definitions 



Name 



Value (hex) 



Reference Source 



Description 



IREAD 



OE 



Ibox 



Aligned quadword I-stream read 
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Table 12-1 (Cont.): Reference Definitions 



Name 



Value (hex) 



Reference Source 



DREAD 1C 
DREAD.MODIFY ID 

DREAD LOCK IF 



Ibox, Ebox, Mbox 
Ibox 

Ebox 



Description 



Variable length D-stream read 

Variable length D-stream read with 
modify intent as a result of Ibox- 
decoded modify specifiers 

Variable length D-stream read with 
atomic memory lock 



WRITE.UNLOCK 1A 

WRITE IB 

DEST.ADDR ID 

STORE 19 



Ebox 

Ebox 
Ibox 

Ebox 



Variable length write with atomic 
memory unlock 

Variable length write 

Supplies address of a write-only 
destination specifier 

Supplies write data corresponding 
to a previously translated destina- 
tion specifier addreBs. 



IPR_WR 
IPR„RD 
IPR.DATA 
LOAD. PC 



06 
07 
04 
05 



Ebox 
Ebox 
Mbox 
Ebox 



Internal Processor Register Write 

Internal Processor Register Read 

Transfers Mbox IPR data to Ebox 

Transfers a PC value to Ibox via 
m%md_bub<3 1:0> 



PROBE 
MMECHK 



TB_TAG_FILL 
TB_PTE_FILL 
TBIS 

TBIA 
TBIP 



09 
08 



0C 
14 
10 

18 
11 



Ebox 

Ebox, Mbox 



Ebox, Mbox 
Ebox, Mbox 
Ebox 

Ebox,Mbox 
Ebox 



Mbox returns ACV/TNV/M=0 sta- 
tus of specified address to Ebox. 

Performs ACV/TNV/M=0 check on 
specified address and invokes the 
appropriate memory management 
exception 

Writes a TB tag into a TB entry. 

Writes PTE data into a TB entry. 

Invalidates a specific PTE entry in 
the TB. 

Invalidates all entries in TB. 

Invalidates all PTE entries in TB 
corresponding to process-space trans- 
lations. 



D_CF 
I CF 



03 
02 



Cbox 
Cbox 



D-stream quadword Pcache fill 
I- stream quadword Pcache fill 



12-8 The Mbox 



DIGITAL CONFIDENTIAL 



NVAX Plus CPU Chip Functional Specification, Revision O.S, October 1991 



Table 12-1 (Cont.): Reference Definitions 



Name Value (hex) 



Reference Source 



Description 



INVAL 



01 



Cbox 



Hexaword invalidate of a Pcache 
entry 



STOP„SPEC_Q 



OF 



Ibox 



Stops processing of specifier refer- 
ences. 



NOP 



00 



Ibox, Ehox, Mbox 



No operation 



12.3..2 Arbitration Algorithm 

Since Cbox references always want to be processed immediately, a validated CBOX_LATCH al- 
ways causes the Cbox reference to be driven before all other pending references. 

A validated RTY_DMISS_LATCH, MME.LATCH, and VAP.LATCH have priority over the EM_ 
LATCH. 

12.4 READS 

12.4.1 Generic Read-hit and Read-miss/Cache_fiII Sequences 

In order to orient the reader as to how memory reads are processed by the Mbox, this section will 
describe the "vanilla" read sequence. It does not discuss reads which TB_MISS, or otherwise are 
stalled for a variety of different reasons. 

The byte mask generator generates the corresponding byte mask by looking at M_QUE%S5_VA<2 :0 > 
and M_QUE%S5J>L<1:0> and then drives the byte mask onto M_QUE%S5_BM < 7 : 0 > . Byte mask data 
is generated on a read operation in order to supply the byte alignment information to the Cbox 
on an I/O space read. 

When a read reference is initiated in the S5 pipe, the address is translated by the TB (assuming 
the address was virtual) to a physical address during the first half of the S5 cycle. The Pcache 
initiates a cache lookup sequence using this physical address during the second half of the S5 
cycle. This cache access sequence overlaps into the following S6 cycle. During phase four of the 
S6 cycle, the Pcache determines whether the read reference is present in its array. 

If the Pcache determined that the requested data is present, a "cache hit" or "read hit" condition 
occurs. In this event, the Pcache drives the requested data onto B%se_DAIA.<63rO>. The signal, 
M%CBOX_REF_ENABLE, is de-asserted to inform the Cbox that it should supply the data from the 
Pcache. 

If the Pcache determined that the requested data is not present, a "cache miss" or "read miss" 
condition occurs. In this event, the read reference is loaded into the EM2SS_LATCH or DMISS_ 
LATCH (depending on whether the read was I-stream or D-stream) and the Cbox is instructed to 
continue processing the read by the Mbox assertion of M%CBOX_REF„ENABLE. At some point later, 
the Cbox obtains the requested data. The Cbox will then send four quadwords of data using the 
I_CF (I-stream cache fill) or D,_CF (D-stream cache fill) commands. The four cache fill commands 
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together are used to fill the entire Pcache block corresponding to the hexaword read address. 
In the case of D-stream fills, one of the four cache fill command will be qualified with C%REQ_ 
DQW indicating that this quad word fill contains the requested D-stream data corresponding to 
the quadword address of the read. When this fill is encountered, it will be used to supply the 
requested read data to the Mbox, Ibox and/or Ebox. 

If the requested is returned to the CBOX with a dRAck response indicating the data is not to be 
placed in Pcache, the CBOX windows the fill commands with C%DRACK_NO CACHE „H causing the 
read block not to be allocatted. 

If, however, the physical address corresponding to the I_CF or D_CF command falls into I/O 
space, only one quadword fill is returned and the data is not cached in the Pcache. Only memory 
data is cached in the Pcache. 

Each cache fill command sent to the Mbox is latched in the CBOX_LATCH. Note that neither 
the entire cache fill address nor the fill data are loaded into the CBOX_LATCH. The address in 
the IMISS_LATCH or DMISS_LATCH, together with two quadword alignment bits latched in the 
CBOX_LATCH are used to create the quadword cache fill address when the cache fill command 
is executed in S5. When the fill operation propagates into S6, the Cbox drives the corresponding 
cache fill data onto B%S6_DAIA<63:0> in order for the Pcache to perform the fill. 

12.4.1.1 Returning Read Data 

Data resulting from a read operation is driven on B%S6_DATA. by the Pcache (in the cache hit case) 
or by the Cbox (in the cache miss case). This data is then driven on M%MD_BUS<63:0> by the 
MD_BUS_ROTATOR in right-justified form. The signals M%VIC_DATA, M%D30X_DATA, M%IBOX_ 
IPR_WR, M%EBOX_DATA, M%MBOX_DATA, are conditionally asserted with the data to indicate the 
destination(s) of the data. 

In order to return the requested read data to the Ibox and/or Ebox as soon as possible, the Cbox 
implements a Pcache Data Bypass mechanism. When this mechanism is invoked, the requested 
read data can be returned one cycle earlier than when the data is driven for the S6 cache fill 
operation. The bypass mechanism works by having the Mbox inform the Cbox that the next S6 
cycle will be idle, and thus the B%S6_DATA bus will be available to the Cbox. When the Cbox is 
informed of the S6 idle cycle, it drives the B%S6_DATA bus with the requested read data if read 
data is currently available (if no read data is available during a bypass cycle, the Cbox drives 
some indeterminent data and no valid data is bypassed). The read data is then formatted by 
the MD_BUS_ROTATOR and transferred onto the M%MD_BUS to be returned to the Ibox and/or 
Ebox, qualified by M7cVTC_DATA, M%EBOX_DAIA, and/or M%EBOX_DAXA 

12.4.2 D-stream Read Processing 

A DREAD_LOCK command always forces a Pcache read miss sequence regardless of whether 
the referenced data was actually stored in the Pcache. This is necessary in order that the read 
propagate out to the Cbox so that the memory lock/unlock protocols can be properly processed. 
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1 2.4.3 I/O Space Reads 

I/O space reads are denned as reads which address I/O space. Therefore, a read is an I/O read 
when the physical address bits, addr<31:29>, are set. I/O space reads are treated by the Mbox 
in exactly the same way as any other read, except for the following differences: 

• I/O space data is never cached in the Pcache. Therefore, an I/O space read always generates 
a read-miss sequence and causes the Cbox to process the reference. 

• Unlike, a memory space miss sequence, which returns a hexaword of data via four I„CF or 
D_CF commands, an I/O space read returns only one piece of data via one I_CF or D_CF 
command. Thus the Cbox always asserts C%LAST_FTLL on the first and only I_CF or D_CF 
I/O space operation. If the I/O space read is D-stream, the returned D_CF data is always less 
than or equal to a longword in length. 

• I/O space D-stream reads are never prefetched ahead of Ebox execution. An I/O space D- 
stream read issued from the Ibox is only processed when the Ebox is known to be stalling on 
that particular I/O space read. 

NVAX RESTRICTION 

I-stream I/O space reads must return a quadword of data. Execution of an I-stream 
I/O space read which does not return a quadword of data is unpredicatable. 

12.5 WRITES 

All writes are initiated by the Mbox on behalf of the Ebox. The Ebox microcode is capable of 
generating write references with data lengths of byte, word, longword, or quadword. With the 
exception of cross-page checks, the Mbox treats quadword write references as longword write 
references because the Ebox datapath only supplies a longword of data per cycle. Ebox writes 
can be unaligned. 

The Mbox performs the following functions during a write reference: 

• Memory Management checks: The Mbox checks to be sure the page or pages referenced have 
the appropriate write access and that the valid virtual address translations are available. 
(See Section 12.12 ) 

• The supplied data is properly rotated to the memory aligned longword boundary. 

• Byte Mask Generation: The Mbox generates the byte mask of the write reference by exam- 
ining the write address and the data length of the reference. 

• Pcache writes: The Pcache is a write- through cache. Therefore, writes are only written into 
the Pcache if the write address matches a validated Pcache tag entry. 

The one exception to this rule is when the Pcache is configured in force D-stream hit mode. 
In this mode, the data is always written to the Pcache regardless of whether the tag matches 
or mismatches. 

• All write references which pass memory management checks are transferred to the Cbox 
via B%S6„DATA<63:0>. The Cbox is responsible for processing writes in the Bcache and for 
controlling the protocols related to the write-back memory subsystem. 
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When write data is latched in the EM_LATCH, the 4-way byte barrel shifter associated with the 
EM_LATCH rotates the EM„LATCH data into proper alignment based on the lower two bits of 
the corresponding address. The diagram below illustrates the barrel shifter function: 



Figure 12-2: Barrel Shifter Function 



original —4——+——+ 
4 byres of : 4 I 3 i 2 \ 1 I 

Ebox date —4.— —-4 



barrel shifter 4— — 4~~ - 

output when I 3 | 2 

K_QUE%S5_VA<1:0> - 01 + 4- 

barrel shifter 

output when ! 2 I 1 

K_QUE%£5_VA<1:0> - 10 + + ■ 

barrel shifter — — — — 4— — 

output when i 1 I 4 

K QU£%SE VA<1:0> - 11 4- 4- 



The result of this data rotation is that all bytes of data are now in the correct byte positions 
relative to memory longword boundaries. 

When write data is driven from the EM_.LA.TCH, M_QUE%S5_DAIA< 3 1 :0 > is driven by the output 
of the barrel shifter so that data will always be properly aligned to memory longword addresses. 

Note that, while the M%M_ QUE%S5_D ATA bus is a longword wide, the B%S6_DAXA bus is a quadword 
wide. B%S€_DATA is a quadword wide due to the quadword Pcache access size. The quadword ac- 
cess size facilitates Pcache and VIC fills. However for all writes, at most half of B%S6_DATA<63:0> 
is ever used to write the Pcache since all write commands modify a longword or less of data. When 
a write reference propagates from S5 to S6, the longword aligned data on M_QUE%S5_DAIA<31:0> 
is transferred onto both the upper and lower halves of B%S6_DATA<63:0> to guarantee that the 
data is also quadword aligned to the Pcache and Cbox. The byte mask corresponding to the 
reference will control which bytes of B%S6_DAIA<63 :0> actually get written into the Pcache or 
Bcache. 

Write references are formed through two distinct mechanisms described below. 

12.5.1 Writes to I/O Space 

I/O space writes are defined as a write command which addresses I/O space. Therefore, a write 
is an I/O space write when the physical address bits, addr<31:29>, are set. I/O space writes 
are treated by the Mbox in exactly the same way as any other write, except for the following 
differences: 

• I/O space data is never cached in the Pcache; therefore, an I/O space write always misses in 
the Pcache. 
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12.6 IPR PROCESSING 

12.6.1 MBOX IPRs 

The Mbox maintains the following internal processor registers: 

Table 12-2: Mbox IPRs 

IPR Address 

Register Name (in hex) 

MPOBR (Mbox P0 Base Register) 1 EO 

MPOLR (Mbox P0 Length Register) 1 El 

MP1BR (Mbox PI Base Register) 1 E2 

MP1LR (Mbox PI Length Register) 1 E3 

MSBR (Mbox System Base Register) 1 E4 

MSLR (Mbox System Length Register) 1 E5 

MMAPEN (Map Enable Bit) 1 E6 

PAMODE (Address Mode) E7 

MMEADR (MME Faulting Address Register) 1 E8 

MMEPTE (PTE Address Register) 1 E9 

MMESTS (status of memory management exception) 1 EA 

TBADR (address of reference causing TB parity error) EC 

TBSTS (status of TB parity error) ED 

PCADR (address of reference causing Pcache parity error) F2 

POSTS (status of Pcache parity error and PTE hard errors) F4 

PCCTL (control state of Pcache operation) F8 

PCTAG 01800000..01801FEO 

PCDAP 0;LC00000..01C01FF8 



1 Testability and diagnostic use only; not for software use in normal operation. 



The first thirteen IPRs listed above (memory management IPRs) are stored in the S5 pipe in 
the register file of the MME„DATAPATH. All other IPRs are stored in the S6 pipe. Note that 
when an Mbox IPR, other than a Pcache tag, is addressed, the actual IPR address is received on 
M_QUE%S5_VA<9:2> (the table above is written such that all addresses start at bit<0>). 

The following is the format description of each Mbox IPR. 
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Figure 12-3: MPOBR Register 



21 30 29 2612"' 26 25 24 123 22 21 20119 IB 17 16115 14 13 12111 10 09 06107 06 05 04 103 02 01 00 
111 0| syst«n virtual page address of P0 page table I 01 01 0| 0| 0| 0| 0| 01 0 1 : MPOBR 



Figure 12-^4: MP0LR Register 



31 30 29 26127 26 25 24123 22 21 20119 16 17 16 115 14 13 12111 10 09 08 107 06 05 04 103 02 01 00 

r ~ »--4 T- — |.--4--4--4--4--4~4--*--4"4--4-----4--4--*--4~-4--4--4--4--4--4--4" 

0| Oi 0| 0! 0| 0| 0| 01 01 01 length of P0 page table in longwords | :MP0LR 



Figure 12-5: MP1 BR Register 



31 30 29 28127 26 25 24123 22 21 20119 16 1"> 16 115 14 13 12111 10 09 06107 06 05 04 103 02 01 00 



II! 0 ! systeir, virtual page address of PI page table I 0| 0| 0| 0| 01 01 01 0| 01 :MP1BR 



Figure 12-6: MP1LR Register 



31 30 29 26127 26 25 24123 22 21 20119 16 17 16115 14 13 12111 10 09 08|07 06 05 04 103 02 01 00 

4 1--- __ + __ + __^_« + _- + __*-_^__4.--4.— -._-+_-+_-»^-, 4—4 

I 0; 0i 0| 0, 0' 0| 0! 01 0| 0| length of (2"+21) - PI page table in longwords | :M?1LR 



Figure 12-7: MSBR Register 



21 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 06 | 07 06 05 04 103 02 01 00 

(.— 4~ +— +— +— +— 4— +— 4— -+--+--4--+--+--+--4---+--+--+--4--+--+--+--4--- +--+-- +--4---4---+--+--4 

I physical page address of system page table I 0| 01 0| 0| 01 0| 0| 0| 0 I :MSBR 
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Figure 12-8: MSLR Register 



31 30 29 28 127 26 25 24123 22 21 20 119 16 17 16 1 15 14 13 12111 10 06 06|07 06 05 04 103 02 01 00 

l.— — — +— — — *•»+■•+• •.4.—— 4.. ..4.—— 4..... 4_ — _ +_ _ m.+~m. _ 4_ _ +_ j- 

I 0| 0| 0| 0| 0| 01 01 01 0| 0 1 length of system page table in longwords | :MSLP. 



Figure 12-9: MM A PEN Register 



31 30 29 28127 26 25 24123 22 21 20116 16 17 16115 14 13 12111 10 06 06 1 07 06 05 04103 02 01 00 



I 0! 0| 01 0; 0| 0| 0| 01 0| 0 1 01 01 0| 0! 0| 0| 01 0| 01 0| 01 0| 0 1 01 01 0| 0| 01 0 1 01 01 M|:MMAPEK 



Table 12-3: 


MMAPEN Definition 




Name 


Bit(s) Type 


Description 


M 


0 RW.O 


When 0, disables Mbox memory management When 1, enables 






Mbox memory management. 



Figure 12-10: PAMODE Register 



31 30 29 28127 26 25 24123 22 21 20116 18 17 16 1 15 14 13 12(11 10 06 08107 06 05 04 103 02 01 00 

+. — + 4—+—.+—+ 4—+— +-«+— ■+—+--+--+--+--+—+--+--+— •+ +— 4— + 

I 0| 0| 01 0 1 0| 0| 0| 0| 0| 01 01 0| 01 01 0| 01 01 0| 01 0| 01 0| 0| 0| 0| 0| 01 0! 01 0| 0| I: PAMODE 



MODE' 



Table 12-4: 


PAMODE Definition 




Name 


Bit(s) Type. 


Description 


MODE 


0 RW,0 


When 0, maps addresses from a 30-bit physical address space. When 
1, maps addresses from a 32-bit physical address space. 
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Figure 12-11: MMEADR Register 



31 3C 29 28127 26 25 24123 22 21 20116 16 17 16115 14 12 12111 10 OS* 08107 06 05 04 103 02 01 00 

+ J + __ + __^ + __ + __ + _„ + „*--+--+--+--*--+~- + --+--+--T-- + -- + --+--+--+---h--4---~--+-- + ~"-t-- + 

I address associated with recorded MME fault I : MMEADR 



Figure 1 2-1 2: MMEPTE Register 



31 30 2S 2812"? 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 06 06|07 06 05 04 103 02 01 00 
I PTE address associated with an address corresponding to a modify fault I rMMEPTE 



Figure 12-13: MMESTS Register 



31 30 26 28127 26 25 24123 22 21 20116 18 17 16115 14 13 12111 10 06 08107 06 05 04 103 02 01 00 
I ! SRC I 0| 01 01 01 01 0| 01 01 0| 0 1 FAULT I 0| 0| 0| 01 01 0| 0| 0| 01 01 01 Ml |LV| : MMESTS 

C * > | 

I I 

+ LOCK PTE REF--+ 



Table 12-5: 


MMESTS Register Definition 


Name 


Bit(s) 




Description 


LV 


0 


RO,0 


Indicates ACV fault occurred due to length violation. 


PTE.REF 


1 


RO 


Indicates ACV7TNV fault occurred on PTE reference corresponding 
to MMEADR. 


M 


2 


RO 


Indicates corresponding reference had write or modify intent. 


FAULT 


15:14 


RO 


Indicates nature of memory management fault See Fault bit encod- 
ings below 


SRC 


28:26 


RO 


Complemented shadow copy of LOCK bits. However, the SRC bits 
do not get reset when the LOCK bits are cleared. 


LOCK 


31:29 


RO 


Indicates the lock status of MMESTS. See LOCK encodings below. 
This field is cleared on e%flush_mbox. 
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Table 12-6: FAULT Encodings 



Defined FAULT values (bi- 
nary) Definition 



01 


ACV Fault. This is the highest priority fault in the presence of multiple 




simultaneous faults. 


10 


TNV Fault. This is the next highest priority fault. 


11 


M=0 Fault. This is the lowest priority fault. 


Table 12-7: 


LOCK Encodings 


Defined LOCK values (bi- 


nary) 


Definition 


000 


MMESTS, MMEADR and MMEPTE are unlocked. 


001 


valid IREAD fault is stored (no o£her IREAD fault can overwrite MMESTS, 




MMEADR, or MMEPTE). 


011 


valid Ibox specifier fault is stored (only an Ebox. reference fault can overwrite 




MMESTS, MMEADR, or MMEPTE). 


111 


valid Ebox fault is stored (MMESTS, MMEADR, and MMEPTE are com- 




pletely locked). 



Note that the encodings for the SRC bits are the complemented version of the the LOCK bits. 
Thus, for example, a fully locked SRC encoding is 000. 



Figure 12-14: TBADR Register 



31 30 29 2812" 26 25 24123 22 21 20 1 1& 16 17 16115 14 13 12111 10 09 08107 06 05 04)03 02 01 00 
i virtual address associated with the recorded TB parity error I : TBADR 



Figure 12-15: TBSTS Register 



31 30 29 28127 26 25 24|23 22 21 20 1 IS 18 17 16 1 15 14 13 12111 10 09 08 1 07 06 05 04 103 02 01 00 

r--4 (.—+—+—+—+ — +--+—+--+— H +—+—+--+—+—+—+—+•—+—+--+— 4—+—+— +—+--+—+—+—+— 4 

I SRC I 0| 0| 0| 01 0| 0| 0| 0| 01 01 0| 0| 0| 0| 0| 0| 0| 0| 01 0| CMD I I I I I : TBSTS 

I I I I 

EM_VA1, —+ I I I 

TPERR -> 4 | | 

DP ERR 4 | 

LOCK — 4 
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Table 12-6: TBSTS Description 



Name 


Bit(s) 


Type 


Description 


LOCK 


0 


WC,0 


Lock Bit. When set, validates TBSTS contents and prevents any 
other field from further modification. When clear, indicates that no 
TB parity error has been recorded and allows TBSTS and TBADR 
to be updated. 


DPERR 


1 


RO 


Data Error Bit. When set, indicates a TB data parity error. 


TPERE 


2 


RO 


Tag Error Bit. When Bet, indicates a TB tag parity error. 


EM_VAL 


3 


RO 


EMJLATCH valid bit. Indicates if EM.LATCH was valid at the time 
of the error TB parity error detection. This helps the software error 
handler determine if a write operation may have been lost due to 
the TB parity error. 


CMD 


8:4 


RO 


S5 command corresponding to TB parity error. 


SRC 


31:29 


RO 


Indicates the original source of the reference causing TB parity error. 



Table 12-9: SRC Encodings 

Defined SRC values Definition 

111 valid Mbox reference error is stored 

110 valid IREAD error is stored 

100 valid Ibox specifier reference error is stored 

000 valid Ebox reference error is stored 



Figure 12-16: PCADR Register 



32 30 2& 28 1 2"? 26 25 24I23 22 21 20I1S 16 17 16 1 15 14 13 12111 10 0? 08107 06 05 04103 02 01 00 

qusdword physical address associated with the recorded Pcache parity error | 0| 0| 0|:PCADR 
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31 30 29 28127 26 25 24 123 22 21 20119 18 17 16|15 14 13 12(11 10 09 0810"? 06 05 04 103 02 01 00 
I II II II II 1| II II II II II II II II II 1| II II 1J II II II I I CMD I I I I I :PCSTS 

+—+_-.+_ — (._• . + __ + __^.__ + »_ + __ +. 1— +— + — 4.— 4—-4.— + 

I I I I I I 

PTE_ER 4 | I I I I 

PTE~ER WR 4 I ! I I 

LEFT_BANK + I I i 

RIGH?_BANK ■ + I I 

DPERR- ■ ■ 4 I 

LOCK < • 4 



Table 12-10: 


PCSTS Description 




Name 


Bit(s) 


Typo 


Description 


LOCK 


0 


WC,<) 


Lock Bit. When set, validates PCSTS<8:1> contents and prevents 
modification of these fields. When clear, invalidates PCST£k8:l> 
and allows these fields and PCADR to be updated. 


DPERR 


1 


RO 


Data Error Bit. When set, indicates a Pcache data parity error. 


RIGHTJ3ANK 


2 ' 


RO 


Right Bank Tag Error Bit When set, indicates a Pcache tag parity 
error on the right bank. 


LEFT_BANK 


S 


RO 


Left Bank Tag Error Bit When set, indicates a Pcache tag parity 
error on the left bank. 


CMD 


8:4 


RO 


S6 command corresponding to Pcache parity error. 


PTE.ER.WR 


9 


WC,() 


Indicates a hard error on a PTE DREAD which resulted from a TB 
miss on a WRITE or WRITEJJNLOCK. 


PTE.ER 


10 


WC,0 


Indicates a hard error on a PTE DREAD. 



Note that the state of PCSTS<31:11> are "don't cares" during an IPR write operation. 



Figure 12-18: PCCTL Register 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04 103 02 01 00 



II II 1! II II II II II II II II II II II II II II II II II 11 II 



PMM 



I I 



I : PCCTL 



I I 

RED ENABLE 4 | 

ELEC_DISABLE-' 4 

P_ENABLE 

BANKJ5EL 

FORCEJHIT 

I_ENABL£ 

D~ENABLE 



I I 
I I 
I I 
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Table 12-11: PCCTL Definition 



Name 



Bit(s) Type Description 



D ENABLE 



I ENABLE 



FORCEfflT 



BANK SEL 



P.ENABLE 
PMM 



ELECDISABLE 8 



RED„ENABLE 9 



0 RW,0 When set, enableB Pcache for all INVAL operations and for all 

D -stream read/write/fill operations, qualified by other control bits. 
When clear, forces a Pcache miss on all Pcache D -stream read/ write/fill 
operations. Note, however, that an ACV/TNV/M=0 condition over- 
rides a desaBserted D_ENABLE in that it will force a Pcache hit 
condition with D_ENABLE=0. 

1 RW,0 When set, enables Pcache processing of INVAL, IREAD and LCF 

commands. When clear, forces a Pcache miss on IREAD operations 
and prevents state modification due to an I_CF operation. 

2 RW,0 When Bet, forces a Pcache hit on all reads and writes when Pcache 

is enabled for I or D-stream operation. 

3 RW,0 When set with FORCE.HITsl, selects the "right bank" of the ad- 

dressed Pcache index. When clear with FORCE„HIT=l, selects the 
'left bank" of the addressed Pcache index. BANK_SEL is a don't 
care when FORCE_HIT=0. NOTE: BANK.SEL never affects bank 
selection during IPR reads and IPR writes to the Pcache tags or 
Pcache data parity bits; bank selection for these commands is always 
determined by the specified IPR address. 

4 RW,0 When set, enables detection of Pcache tag and data parity errors. 

When deasserted, disables Pcache parity error detection. 

7:5 RW,0 Specifies Mbox performance monitor mode (see Section 12.17). Note 
that this field does not control or affect the operation of the Pcache 
in any way. PMM is placed in PCCTL for the convenience of the 
hardware implementation. 

RW,0 When set, the PtJache is disabled electrically to reduce power dis- 
sipation. NOTE: This bit should only be set when the Pcache is 
functionally turned off by the deassertion of both I_ENABLE and 
D_ENABLE. UNPREDICTABLE operation will result when this bit 
is set when either I_ENABLE or D_ENABLE is also set. Also note 
that Pcache tag or parity IPRs will not function properly when this 
bit is unconditionally set. 

RO When set, indicates that one or more Pcache redundancy elements 
are enabled (see Section 12.11 for more information). 



Note that the state of PCCTL<31:10> are "don't cares" during an IPR write operation. 
Figure 12-19: PCTAG Register 



31 30 29 28127 26 25 24123 22 21 20119 18 17 16 | 15 14 13 12111 10 09 08107 06 05 04 103 02 01 00 

(. + 4- 4 4 + + 4- + +— + +— 4- 4—4 + 4—+--+ 4 4— 4— -4" 4 +--4— -4 4.— +—4.— 4 

I tag I 11 1| II 1| 1| II P| valid bits| A|:PCTAG 
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Table 12-12: 


Pcache Tag I PR Format 


Name 


Bit(s) 




Description 


A 


0 


RW 


Allocation Bit corresponding to index of this tag. 


valid bits 


4:1 


RW 


Valid Bits corresponding to the four data subblocks. PCTAGk4> cor- 
responds to uppermost quad word in block. PCTA.G<1> corresponds 
to lowermost quadword in block. 


P 


5 


RW 


Even Tag Parity 


tag 


31:12 


RW 


Tag Data 



Note that the state of PCTAG<11:6> are "don't cares" during an IPR write operation. 



Figure 12-20: PCDAP Register 



31 30 29 28127 26 25 24.123 22 21 20115 16 17 16115 14 13 12111 10 06 08 107 06 05 04 103 02 01 00 
II II II II II II II II II II II II II II II 1| 1! II II II 11 II II II • ■ DATA_PAR!TY I : PCDAF 



Tabie 12-13: Pcache Data Parity IPR Format 

Name Bit(s) Type Description 

DATA_PARITY 7:0 RW Even byte parity corresponding to addressed quadword of data. Bit 

n represents parity for byte n of addressed quadword. 



Note that the state of PCDAP<31:8> are "don't cares" during an IPR write operation. 

12.7 INVALIDATES 

**The Cbox initiates an invalidate by PASSING iAdr<12:5> and InvReq<l:0> RECEIVED FROM 
SYSTEM LOGIC qualified by the INVAL command. The ENVAL command is latched by the Mbox 
in the CBOX.LATCH. The set and index specified are unconditionally invalidated.** 

Execution of an INVAL command guarantees that data corresponding to the specified hexaword 
address will not be valid in the Pcache. THE SYSTEM LOGIC IS RESPONSIBLE FOR PRIMARY 
CACHE COHERENCY IN NVAX Plus. The block valid bit and the four corresponding subblock 
valid bits are cleared to guarantee that any subsequent Pcache accesses of this hexaword will 
miss until this hexaword is re-validated by a subsequent Pcache fill sequence. If a cache fill 
sequence to the same INDEX AND SET is in progress when the INVAL is executed, a bit in 
the corresponding MISS_LATCH is set to inhibit any further cache fills from loading data or 
validating data for this cache block. 

Also note that an assertion of C%CBOX_HARD_ERR during a cache fill command causes the cache 
fill operation to be processed as if it were an INVAL, operation. 
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12.7.1 ABORTING REFERENCES 

The Mbox abort operation is used to cancel the current S5 operation. When an abort is executed, 
the S5 state, which would normally be updated due to execution of the current S5 reference, is not 
updated. The aborted S5 reference is not propagated into S6. Instead, a NOP is introduced into 
the S6 pipe. In effect, an aborted S5 reference is equivalent to a NOP command being executed 
in S5. 

Note that the abort operation should be viewed as only cancelling the current execution of a refer- 
ence. In most cases, aborting an operation does not invalidate the existence of the corresponding 
reference, which will still be stored in one of the reference sources and retried at a later point. 

The abort operation is executed when ABORT is asserted. The following changes to Mbox state 
are inhibited during the cycle in which ABORT is asserted: 

• The reference source which drove the aborted command into S5 does not invalidate the cor- 
responding command. Thus, the reference still exists to be retried during a subsequent 
cycle. 

NOTE 

There are two exceptions to this rule. The CBOX_LATCH is always invalidated 
after it drives a command into SB. The EM_LATCH will be invalidated if the Ebox 
has explicit!}' requested it to be (via the E%EM_ABORT signal). 

• Loading the PA.QUEUE with a DEST.ADDR or DREAD_MODIFY command is inhibited. 
Emptying the PA_QUEUE when a STORE command is driven in S5 is inhibited. 

• If the unaligned detection logic detected an unaligned reference during the aborted cycle, the 
VAP_LATCH is not validated to contain the second portion of the unaligned sequence. 

12.8 Conditions for Aborting References 

In general, references are aborted for five reasons: 

• The reference is aborted to prevent a reference order restriction from occurring. 

• The reference is aborted because insufficient hardware resources are available to complete 
processing of the current command. 

• The reference is aborted because a memory management operation must be performed prior 
to execution of the current reference. 

• The reference is aborted in order to avoid a deadlock condition related to unaligned references. 

• The reference is aborted due to an external flush condition. 

12.9 READ_LOCK/WRITE_UNLOCK 

Once a READ_LOCK command has been passed to the Cbox, the Cbox can not process any 
subsequent I-stream read references, and should not receive any D-stream references besides the 
IPR read of STxC pass/fail or a retry of the read_lock, until a STxC pass signal is received from 
the CBOX. 
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This is accomplished by the arbitration logic by disabling IREF„LATCH selection once a DREAD_ 
LOCK command has successfully been retired from the S5 pipe. Thus, no IREAD TB.MISS can 
occur between the READJLOCK and STxC pass, thus avoiding D-Stream references not part of 
the interlock sequence. 

The arbitration logic will re-enable IREF_LATCH selection on either of the following two condi- 
tions: 

1. The STxC IPR is read and the condition indicates pass. This will cause the Cbox to resume 
I-stream read processing. 

2. E%FLUSH_MBOX is asserted by the Ebox due to a hard error. This condition should occur much 
more infrequently than the above condition because a WRITE_UNLOCK must normally be 
issued after a READ„LOCK. However, if an error occurred sometime between the READ_ 
LOCK and STxC Pass, a hard error microtrap will result preventing a WRITE_UNLOCK 
from being issued. The microtrap will generate E%FLUSH_MBOX which re-enables IREF_ 
LATCH selection because no WRITE„UNLOCK will follow. 

**Note that the Cbox state, which prevents subsequent I-stream reads from being processed 
before the WRITE.UNLOCK, will be cleared by an IPRJWRITE during the error handler. ** 

Note that Ibox processing will have been halted prior to the READ_LOCKAVRITE_UNLOCK 
sequence. The Ebox microcode will never issue a D-stream read in the middle of a READ. 
LOCKAVRITE_UNLOCK sequence. 

12.10 Pcache Replacement Algorithm 

Each line of Pcache contains an. allocation bit which is used to indicate which bank (left or right) 
should be used for the next fill sequence of that index; This results in a "not last used" allocation 
to the Pcache sets. 

When an invalidate clears the valid bits of a particular tag within an index, it only makes sense 
to set the allocation bit to point to the bank select used during the invalidate regardless of which 
bank was last allocated. By doing so, we guarantee that the next allocated block within the 
index will not displace any valid tag because the allocation bit points to the tag that was just 
invalidated. 

For systems that require the Pcache to function as direct mapped, the allocate bit during a fill 
sequencers ignored, and the fill follows address[12]. 

12.11 Pcache Redundancy Logic 

Due to the extreme density of the Pcache array, the Pcache has a high susceptibility to manu- 
facturing defects. As a result, redundancy logic was designed in order to provide a mechanism 
which would allow the Pcache to function correctly in the presence of a small number of man- 
ufacturing defects. Refer to NVAX CPU Chip Functional Specification for the description of the 
PCache Redundancy feature. 
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1 2.1 2 MEMORY MANAGEMENT 

The Mbox, the Ebox microcode, and the VMS memory management software implement VAX 
memory management. The Mbox performs the hardware memory management functions nec- 
essary to process most references in a quick efficient manner. The operating system software 
performs all other functions. For a description of the hardware end of VAX memory management, 
the reader is referred to the Memory Management chapter of the "VAX Architecture Standard" 
(DEC STD 032). For a complete description of the software end of VAX/VMS memory manage- 
ment, the reader is referred to the Memory Management chapters of 'VAX/VMS Internals and 
Data Structures". 

The Mbox is responsible for the following memory management functions: 

• Performing virtual-to-physical address translations. 

• Maintaining a cache of PTEs to perform the quick translations. 

• • Performing access mode checks on memory references. 

• Performing TNV checks on memory references. 

• Performing M=0 checks on memory references. 

• Directly or indirectly invoking a software memory management exception handler due to ACV 
(Access Violation) or TNV (Translation not Valid) or M=0 faults. 

• Detecting cross-page conditions and performing the corresponding access mode checks. 

1 2.1 2.1 ACV/TNV/M=0 Fault Handling: 

In order for an ACV, TNV, or M=0 fault to be processed, the following steps must occur: 

1. The Mbox must detect the ACV/TNV/M=0 condition. 

2. The Ebox microcode must be invoked to start processing the condition. 

3. The Ebox microcode must probe Mbox state in order to determine which fault occurred and 
how it should be processed. 

4. The Ebox microcode must service the fault condition directty, or it must invoke an operating 
system memory management service routine to service the fault. 

5. If the memory management fault was not fatal to the process, normal instruction execution 
resumes by restarting the instruction corresponding to the memory management fault after 
servicing the fault. 

12.12.2 ACV detection: 

The protection field of a PTE indicates the authorized access rights for each execution mode. 
When a reference causes the TB to access a PTE, the protection field of the PTE corresponding 
to the reference is driven out of the TB. The ACV (Access "Violation) detection logic uses the PTE 
protection field, M_QUE%S5_AT<1:0>, and the appropriate CPU execution mode from the Ebox (i.e. 
user, supervisor, executive, kernel) to detect access violations. If, for example, the protection 
field indicates a "read-only" access in user mode, the CPU execution mode specifies user mode, 
and M_QUE%S5_AT<1:0> indicates write access, then an ACV condition is flagged since a write 
reference is not allowed to this page in user mode. 
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A 2:1 MUX controls the source of the CPU execution mode. The CPU execution mode information 
is normally taken directly from the current mode field of the PSL (PSL<25:24>). On PROBE 
references, however, the CPU execution mode is driven from MMGT_MODE<1:0> in order to check 
for ACV conditions for an execution mode which the CPU is not currently in. 

An ACV condition is also generated when a PTE reference fails to satisfy the page length check 
corresponding to the virtual space of the reference or when the virtual reference falls into Si 
space. A virtual address in SI space is reported as an ACV length violation. 

An ACV check is also performed on the protection field of all PTEs which have just been sent to 
the Mbox due to an earlier Mbox DREAD issued during the TB_MISS sequence. 

ACV protection and length checks are performed on all Ibox and Ebox references and on all MME_ 
CHKs. ACV page length checks are performed on all PTE addresses. However, ACV protection 
checks are never performed on PTE read references generated by the Mbox. 

Note that the ACV protection condition is disabled from occurring during any cycle where the 
reference is aborted. 

When an ACV condition occurs, the MME.SEQ is invoked to execute the ACV /TNV/M =0 sequence. 
ACV checks only occur on virtual addresses when memory management is enabled and when the 
reference indicates that memory management checks should be done (i.e. M_QUE%S5_QUAL<2> = 
1). 

12.12..2.1 TNV detection 

When the PTE valid bit is clear, it indicates that the corresponding PTE page frame address 
translation is not valid. This is called a Translation Not Valid Fault (TNV). TNV detection only 
occurs during the TB_MISS sequence when the Mbox receives PTE data from the Pcache or 
Cbox such that the PTE valid bit (PTE<31>) is clear. When a TNV fault is detected, the MME_ 
SEQ interrupts the TB_MISS sequence and invokes the ACV/TNV/M=0 sequence. By doing so, 
the invalid PTE is never cached in the TB and a memory management fault is recorded (See 
Section 12.12.2.3 on recording memory management faults). 

12.12.2.2 M=0 detection: 

When a virtual reference causes the TB to access a PTE, the modify bit of the PTE is read out 
of the TB. A cleared modify bit indicates that the corresponding page has not been written to. If 
the valid bit of the PTE is set, and the modify bit is clear and the access type of the S5 reference 
indicates an intention to modify the page (e.g. write or modify OR VSTR access type), then the 
Mbox must initiate the proper sequence of events to process this "M=0" condition. The M=0 check 
is performed when memory management is enabled and a virtual reference hits in the TB. 

Note that the M=0 condition is disabled from occurring during any cycle where the reference is 
aborted. 
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12.12.2.3 Recording ACV/TNV/M=0 Faults 

In order for the microcode to determine the nature of the memory management fault detected 
by the Mbox, the Mbox must record the necessary fault information. The fault information is 
recorded in Mbox IPRs which can be read by Ebox microcode. The fault information is stored in 
three of the registers in the MME register file which are accessible to microcode by IPR reads 
and writes: 

• The MMEADR register stores the virtual address associated with the ACV, TNV or M=0 fault. 
As per SRM requirements, if the ACV/TNV fault occurred by referencing a PTE during a TB 
miss sequence, the MMEADR stores the original address and not the PTE address. 

• The MMEPTE register stores the virtual or physical address of the Page Table Entry corre- 
sponding to a virtual reference upon which an M=0 condition has been detected. 

• The MMESTS register stores state which indicates to the microcode the context and type of 
fault corresponding to the ACV/TNV/M=0 condition. The format of MMESTS is shown below: 

Due to the macropipeline design, the MMEADR, MMEPTE and MMESTS registers must be 
conditionally loaded in a prioritized fashion. These registers are loaded depending on the relative 
states of their current contents and on the context of the current fault. If the MMESTS register 
is empty, the current fault state is always loaded. If the MMESTS register contains a valid 
fault condition, the MMEADR, MMEPTE and MMESTS are only loaded if the current fault is 
associated with a pipe stage further along in the pipe than the stage corresponding to the stored 
MMESTS state. This loading priority is necessary because these memory management faults 
must be reported within the context of the execution of the instruction they are associated with. 
A fault detected on an Ebox reference is loaded provided that another Ebox reference fault is 
not already loaded. Faults detected on Ibox specifier references are only loaded if no Ebox or 
Ibox specifier reference fault is currently stored. Faults on Ibox I-stream references are only 
loaded if the MMESTS register is empty. In effect, the MMESTS register captures the first 
memory management exception that will be associated with Ebox execution. Stated differently, 
it captures the fault which occurs farthest along in the macropipeline. 

The LOCK field of MMESTS specifies the source of the faulting reference currently stored in 
MMESTS. Thus, the decision to load another faulting reference into MMESTS is made by exam- 
ining the bits of the LOCK field. 

The FAULT field is set in a prioritized manner. That is, an ACV fault takes precedence over 
a TNV or M=0 fault. A TNV fault takes precedence over an M=0 fault. Therefore, if multiple 
pending fault conditions are true, only the fault condition with the highest priority is reported in 
the MMESTS register. 

When the Ebox starts the memory management exception microfiow, it issues an IPR_RD to the 
MMESTS to determine the nature of the memory management fault. The MMESTS register is 
automatical^ unlocked by resetting the LOCK field when the E%FLUSH_MBOX signal is asserted 
by the Ebox, 
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12.13 MBOX ERROR HANDLING 

Mbox plays a role in the processing of the following types of errors: 

• TB tag parity errors. 

• TB data parity errors. 

• Pcache tag parity errors. 

• Pcache data parity errors. 

• Errors encountered by the Cbox while processing a memory read, I/O space read, or IPR_RD 
which were transferred from the Mbox to the Cbox. Note that these errors could originate 
from the B cache, or memory subsystem. 

All other possible errors are handled without Mbox involvement. 

12.13.1 Recording Mbox errors 

The Mbox contains four error registers. Two are used to record TB parity errors and the other 
two are used to record Pcache parity errors. 

12.13.1.1 TBSTS and TBADR 

When a TB parity error is detected with LOCK=0, TBADR is loaded with the virtual address 
which caused the TB parity error, and all fields of TBSTS are updated to record the nature of 
the TB parity error. Note that both the TPERR and DPERR bits can be set at the same time if 
these two error conditions occurred during the same cycle. When a TB parity error is recorded, 
the LOCK bit is set to validate the contents of both TBSTS and TBADR registers. When LOCK 
is set, all bits of both registers are frozen and cannot be changed until the LOCK bit is cleared. 
Thus, any subsequent error is not recorded if LOCK=l. 

When the operating system error handler is invoked, TBSTS and TBADR will be read through an 
IPR_RD command in order to determine if any TB parity errors were recorded. If the state of the 
LOCK bit was read to be a zero, then no error has occurred and the remaining state information 
in these two registers is invalid. If the LOCK bit was found to be set, then the remaining error 
state of these two registers characterizes the nature of the recorded error. 

Once the error handler has read these registers, it re-enables TBSTS to record any new errors by 
clearing the LOCK bit. Clearing the LOCK bit is accomplished by writing a "1" to LOCK through 
an IPR_WR operation. 

1 2.1 3.1 .2 PCSTS and PCADR 

The PCSTS and PCADR record Pcache tag and data parity errors. The function and operation 
of these registers is identical to the TBSTS and TBADR registers except that the PCADR stores 
physical quadword addresses rather than virtual byte addresses, and it also records PTE hard 
error events. The definitions of these registers are shown in Figure 12-16 and Figure 12—17. Note 
however, that when PCSTS<0> is set, Pcache memory reads, writes and invalidates are disabled. 
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12.13.2 Mbox Error Processing 

12.13.2.1 Processing Cbox errors on Mbox-initiated read-like sequences 

The Cbox detects errors that occur in the Bcache, or memory subsystem. When the Cbox detects 
one of these errors, and it is associated with an Mbox-initiated reference that requires data to 
be returned (e.g. memory read, I/O space read, or IPR read), the Mbox must transfer the error 
status of the reference back to the destination corresponding to the reference. The Mbox never 
records a Cbox-detected error in Mbox error registers because the error is logged in Cbox error 
registers. 

12.13.2.1.1 Cbox-detected ECC errors 

The Cbox returns requested data through a I_CF or D_CF command to the Mbox while simulta- 
neously checking the error-correction code to check for a possible Bcache error. If an ECC error 
is found, the Cbox asserts C%CBOX_ECC_ERR. This causes the Mbox to latch a NOP in the CBOX_ 
LATCH rather than the cache fill. As a result, the Mbox does not perform any Pcache state up- 
dates resulting from the bad data nor does it assert M%VTC_DATA, M%IB OX_D ATA, M%EBOX_DAIA, 
or M9cMBOX„DATA to indicate the presence of valid data. 

C%CBOX_ECC_ERR IS ALSO USED BY THE CBOX LOGIC AS A LATE ABORT FOR FILL DATA 
DUE TO A MISS OR CACHE TAG COMPARE NOT VALID DUE TO SYSTEM LOGIC OWNING 
THE CACHE DURING THE READ/PROBE CYCLE. 

During subsequent cycles, the Cbox will determine if the ECC error is correctable or not. If it 
is, the data will be corrected and returned. If the data is not correctable, a Cbox-detected hard 
error has occurred and will be dealt with as described below. 

12.13.2.1.2 Cbox-detected hard errors on requested fill data 

If the Cbox has determined that the requested data cannot be returned for some reason, the 
Cbox drives a cache fill command qualified by C%CBOX_HARD_ERR. When this happens, the Mbox 
performs the following actions: 

1. The assertion of C%CBOX_HARD_ERR indicates to the Mbox that the cache fill data is invalid. 
Thus, the Mbox returns the invalid data on the M*32MD_BUS in the same manner that all data 
is returned except that the data is further qualified by M%HARD_ERR. M%HARD_ERR informs 
the receiver that the data is invalid and that the requested data cannot be returned due to a 
hard error. 

2. Once the Cbox detects a hard error on the requested data, the Cbox immediately terminates 
the pending fill sequence by the assertion of C%LAST_FILL. Thus, no further data correspond- 
ing to the same fill sequence will be returned and the Mbox fill sequence corresponding to 
the error is terminated by invalidating the corresponding MISS_IATCH. 

3. An I_CF or D_CF command which is qualified by C%CBOX_HARD_ERR is interpreted by the 
Pcache as an INVAL command. Thus the invalid data is not filled in the Pcache. 
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12.13.2.1.3 Cbox -detected hard errors on non-requested fill data 

The Cbox performs the same actions as described above to indicate the presence of a hard error 
regardless of whether the data is the requested data or just one of the other three pieces of fill 
data for the corresponding P cache block. If the data is non-requested fill data, the Mbox performs 
the following actions: 

1. Once the Cbox detects a hard error on the non-requested data, the Cbox immediately termi- 
nates the pending fill sequence by the assertion of C%LAST_FILL. Thus, no further data corre- 
sponding to the same fill sequence will be returned and the Mbox fill sequence corresponding 
to the error is terminated by invalidating the corresponding MISS__LATCH. 

2. An I_CF or D_CF command which is qualified by C%CBOXJELAIlDJERR is interpreted by the 
P cache as an INVAL command. Thus the invalid fill data is not filled in the P cache and 
all previous fills to the same block are invalidated. This is necessary in order to maintain 
coherency between the P cache and B cache because a Bcache data block will only be validated 
if all the data within the block is error-free. 



12.13.2.2 Mbox Error Processing Matrix 

The following table summaries all Mbox error handling. A blank entry in the table means that 
the corresponding error cannot occur for the given reference. 



Table 12-14; Mbox Error Handling Matrix 

TB tag par- TB data par- P cache tag par- P cache data Cbox hard er- 
Command ity error ity error ity error parity error ror 

Ibox references 



IREAD A A B D F 

DREAD A A B D F 

DREAD.MODIFY A A B D F 

DEST.ADDR A A 
STOP_SPEC_Q 



Ebox references 



DREAD A A B D F 

DREAD„LOCK A A B F 

STORE C 

WRITE A A C 

WRITE.UNLOCK A A C 

EPR_RD (to Pcache) 

IPR_RD (non-Mbox) F 
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Table 12-14 (Cont.): Mbox Error Handling Matrix . 

TB tag par- TB data par- P cache tag par- Pcache data Cboz hard er- 
Command ity error ity error ity error parity error ror 

IPR_WR (to Pcache) 
IPR.WR (non-Mbox) 

PROBE A A 

MME_CHK A A 

TB_TAG_FTLL 

TB_PTE_PILL 

TBIS 

TBIP 

TBIA 

LOAD_PC 



Mbox references 

PTE DREAD A A B 

TB_TAG_FILL 

TB_PTE_FILL A 

IPR.DATA 

MME_CHK A A 



Cbox references 

INVAL E 

D_CF . H 

I_CF H 

LEGEND: 
A. 

• Mbox microtraps Ebox by assertion of M%TB_PERR_TRAP during cycle error was detected. 

• The faulting reference and all pending Ibox and Ebox references are blown away. 

• TBIA command is issued to invalidate entire TB. 

• TBSTS and TBADR are updated appropriately. 

B. 

• A Pcache miss condition is forced to occur on this read reference causing the assertion of 
M%CBOX^REF_ENABLE. This instructs the Cbox to continue processing the read reference. 
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• M%MBOX„S_ERR is asserted to post a soft error interrupt. 

• POSTS and PCADR are updated appropriately (a side effect of this operation turns off 
the Pcache). 

C 

• The Cbox continues to process the write reference, as is done on all write operations 
regardless of a Pcache parity error. 

• M%MBOX_S_ERR is asserted to post a soft error interrupt. 

• PCSTS and PCADR are updated appropriately (a side effect of this operation turns off 
the Pcache). 

D. 

• M%CBOX_LATE_EN is asserted to instruct the Cbox to continue processing the reference 
which caused the Pcache parity error. 

• M%MBOX_SJBRR is asserted to post a soft error interrupt. 

• PCSTS and PCADR are updated appropriately (a side effect of this operation turns off 
the Pcache). 

E. 

• The invalidate operation takes place in spite of the tag parity error because the invalidate 
is only a function of matching all tag bits. 

• M%MBOX„S„ERR is asserted to post a soft error interrupt. 

• PCSTS and PCADR are updated appropriately (a side effect of this operation turns off 
the Pcache). 

F. 

• The Cbox indicated a hard error for a non-PTE read or IPR_RD operation by the assertion 
of C%CBOX_HARD„ERR and C%LAST_FELL. 

• If the hard error corresponded to the data explicitly requested by the Mbox reference, 
M%HARD.ERR qualifies M%MD_BUS data indicating to the M9JMD.BUS receiver that a hard 
error occurred while accessing the requested data. 

• The fill sequence is immediately terminated by the assertion of C%LAST_FELL. and the 
entire Pcache block corresponding to the fill is invalidated. 

G. 

• The hard error detected by the Cbox on this Mbox-issued PTE DREAD is recorded in 
PCSTS. The tb miss sequence is immediately terminated. 

IF the error resulted from an Ibox reference, the error is tagged back to the appropriate 
Ibox reference latch. The error is then signaled via M%HARD_ERR when the requested 
data is returned on M%MD_BUS, or is reported through PA_Q_STATUS<2> (for DEST_ADDR 
commands). 

If the original reference came from the Ebox, M%MME„TRAP is asserted (in all cases except 
for PROBE references). This will invoke the memory management fault handler in order 
to try to report the hard error within the context of the execution of the instruction. 

• The fill sequence is immediately terminated by the assertion of C%LAST_FTLL. and the 
entire Pcache block corresponding to the fill is invalidated. 
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EL C%CBOX_HARD_ERR was asserted by the Cbox during an I_CF or D_CF command. This is the 
mechanism by which the Cbox informs the Mbox of a hard error during a read or IPRJRD 
operation where the Cbox must return data. Thus, see the error responses specified by F and 
G for the error response within context of the original read operation. 

12.14 MBOX INTERFACES 

The Mbox passes data and/or control information to four other sections of the NVAX chip. These 
sections are: 1) Ibox, 2) Ebox, 3) Useq and 4) Cbox. The Cbox interface has additional signals for 
NVAX Plus and is described in this section. Refer to the NVAX CPU Chip Functional Specification 
for MBOX interface signal definitions to the IBOX, EBOX, and Useq. 

12.14.1 Signals from Cbox 

• C%CBOX_CMD<1:0>: Command field of Cbox reference sent to Mbox. 

• C%CBOX_ADDR<12:5>: Invalidate address of Cbox reference sent to Mbox. 

• c%MBOX_FTLL_QW<4:3>: Indicates the aligned quadword within the aligned hexaword. 

• C%REQ_DQW<>: Qualifies the current DJDF to indicate that this is the requested data. 

• B%S6_DATA<63:0>: Data of Mbox reference seen by Cbox. 

• C%S6_DP<7:0>: Even data parity corresponding to B%S6_DAIA<63:0> during cache fill refer- 
ences. 

• C%LAST_FILL: When asserted, indicates that this is the last fill sent for the current sequence. 

• C%CBOXJBARDjBRIt: "When asserted when Cbox is driving data onto the B%S6_DATA Bus, it 
indicates that data on M%MD_BUS is associated with a non-recoverable hard error. 

• C%CBOX_ECCJERR: Indicates that an ECC error is associated with the Cbox data being re- 
turned. 

• c%WR_BUF_BACK_PRES: Indicates that Cbox cannot accept any more entries in its write buffer. 

• C%DRACK_NOCACHE_H: Indicates present fill block should not be placed in Pcache. 

12.14.2 Signals to Cbox 

• M%S6_SET.NUM_H: PCACHE ALLOCATION BIT, ALLOWS CBOX TO BROADCAST TO 
SYSTEM BACKMAPS 

• M%S6„CMD<4:0>: Command field of Mbox reference seen by Cbox. 

• M%se_PA<31:3>: Quadword physical address of Mbox reference seen by Cbox. 

• M%C„S6_PA<2:0>: Address within addressed quadword of Mbox reference seen by Cbox. 

• B%S6_DATA<63:0>: Data of Mbox reference seen by Cbox. 

• M%S6_BYTE_MASK<7:0>: Byte mask field of Mbox reference seen by Cbox. 

• M%CBOX_REF_ENABLE: Indicates that current S6 read reference packet should be latched and 
processed by the Cbox. This signal is a don't care on write operations. 
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• M%CBOX_LATE_EN: Asserted at the end of a cycle to indicate that a P cache parity error was 
detected. As a result, the Cbox must continue to process this reference regardless of what 
M%CBOX_REF_ENABLE indicated. 

• M%ABORT_CBOX_IBD: Indicates that any IRE AD which the Cbox may be processing should be 
immediately terminated. 

• M%CBOX„BYPASS„ENABLE: Indicates that the Cbox may drive B%S6_DATA<63:0> during the 
following cycle in order to attempt a data bypass. 

12.15 INITIALIZATION 

12.15.1 Initialization by Microcode and Software 

It is the responsibility of the power-up microcode to perform an IPRJWRITE operation to clear 
MAPEN before any virtual memory references are issued to the Mbox from either the Ebox or 
Ibox. Failure to clear MAPEN could result in UNDEFINED behavior prior to complete memory 
management state initialization. 

PAMODE is also cleared by the power-up microcode via an IPR_WRITE command. If the system 
configuration requires a 32 bit program-visible physical address space, setting the PAMODE value 
via an IPR_WRITE must be done under very controlled conditions because writes to the PAMODE 
processor register affect both physical address generation and interpretation of PTEs. With the 
possible exception of certain diagnostic code, writes to the PAMODE processor register should 
not be performed while memory management is enabled. With memory management disabled, 
writes to the PAMODE processor register should not be performed unless the PC of the MTPR 
instruction which writes to the register is in one of the following (hex) address ranges: 

00000000. .1FFFFFFF 
EO0O0OO0..FFFFFFFF 

By restricting PC to one of these address ranges, changes to the PAMODE register do not cause 
the generated physical address to change in going from 30-bit mode to 32-bit mode, or vice versa. 

The console code should be executing in the specified range in order to write to the PAMODE 
processor register, and it is expected that this is the place where the PAMODE processor register 
will be initialized. 

In uncontrolled conditions, writes to the PAMODE processor register can cause UNDEFINED 
results. 

12.15.1.1 Pcache Initialization 

The Pcache is disabled by the power-up initialization sequence. In order to enable the Pcache, 
the following sequential actions must be performed: 

1. Pcache IPRJWRITE operations must be performed to each Pcache tag to write the tag field 
to a known state, set the tag parity bit to the corresponding value, and clear the subblock 
valid bits. 
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2. An IPRJWRITE to the PCCTL must be done to enable the Pcache in the desired operation 
mode. 

Note that the data array need not be initialized because correct parity will be written into the data 
array whenever fill data is validated, and data parity is only checked on validated sub-blocks. 

If the sRom is read the Pcache tags are initialized by microcode as the serial data is written to 
the Pcache. 

12.15.1.2 Memory Management Initialization 

Memory management is disabled by MAPEN being cleared by the power-up microcode. Before 
memory management can be turned on, the following actions must be performed: 

• The Ebox must issue a TBIA command to invalidate the TB and reset the NLU pointer to a 
known state. This is done as part of the microcode processing of an MTPR to MAPEN. 

• The Ebox must write the appropriate values into the six memory base and length registers 
via IPR_WRITE commands. 

Once this is done, the Ebox may turn on memory management by setting MAPEN through an IPR„ 
WRITE command. 

12.16 Mbox Testability Features 

This section describes what testability features are made use of for Mbox testability, and what 
Mbox signals are used for each testability function. For a global understanding of NVAX testa- 
. bility, and for a detailed description of each testability strategy and hardware mechanism, the 
reader is referred to Chapter 17. 

12.16.1 Internal Scan Register and Data Reducers 

The following Mbox signals exist in the scan chain: 

— S5_PA<31:0» 

— S5_TAG<5:0> 

— S5_DL<1:0> 

— S5_AT<1:0> 

— S5_DEST<1:0> 

— S5_QUAL<6:0> 

— PA_Q_STATUS<2:0> 

— M%MME_TRAP 

— IREF_LATCH valid bit 

— SPEC.QUEUE valid bits (2) 

— EM.LATCH valid bit 

— VAP_LATCH valid_bit 

— MME_LATCH valid_bit 

— RTY_DMISS_LATCH valid.bit 
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— CBOX.LATCH valid Jrit 

— M%CBOX_BYPASS_ ENABLE 

— M%CBOX_REF_ENABLE 

— M%EM.LAT_FULL 

Note that only S5JPA<31:0> contains a data reducer. Implementing a data reducer on this bus should 
provide coverage for the Mbox S5 pipe as well as coverage for the Ibox, Ebox and Cbox logic which 
issue references to the Mbox. 

12.16.2 Nodes on Parallel Port 

The following signals are observable via the Parallel Port: 

— S5_CMD<4:0> 

— Current Reference Source (3 encoded bits). The encodings are as follows: 

Reference Source Encoding 

NOP or PA.QUEUE (when cmd = STORE) 000 

IREF_LATCH 001 

SPEC.QUEUE 010 

EM_LATCH (when cmd A * STORE) 011 

VAP.LATCH (when cmd A = STORE) 100 

MME.LATCH 101 

RTY_DMISS_LATCH 110 

CBOX.LATCH " 111 

— M%ABORT 

— M%TB_MISS 

— M%PCACHE_MISS 

— MME state machine state bits (4 encoded bits). The encodings are as follows: 

State Name Encoding 

home 0000 

tb_misB_l 0001 

tb_miss_2 0010 

tb_miss_3 0011 

tb_miss_4 0100 

tb_miBB„5 0101 

doub„tb„misB_l 0110 

doub_tb_miss_2 0111 

doub_tb_miss_3 1000 
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fi+Q+A Nam 


kin l t» cr 


doub_tb_miBB_4 


1001 


mme_l 


1010 


mmej 


1011 


ipr_rd_ l_tb_per_2 


1100 


xpage_l 


1101 


tb_per_l 


1110 


undefined 


1111 


MD_BUS Qualifiers (3 encoded bits). The encodings are as follows: 


Event 


Encoding 


undefined 


000 


Ibox data 


001 


Ebox data 


010 


Ibox and Ebox data 


011 


VIC data 


100 


Ibox IPR data 


101 


undefined 


110 


Mbox data 


111 



— M%MME_FAULT 

12.16.3 Architectural features 

All MBOX IPRs can be invoked through the use of MTPR or MFPR macroinstructions. See 
the Architectural Summary Chapter for a list of all Mbox IPR addresses. Note that Mbox IPR 
addresses referenced through the MxPR instruction are translated by the Ebox microcode into 
IPRJEtD, IPR_WR, TBIS, TBIA, or PROBE operations before being issued to the Mbox. 

12.16.3.1 Translation Buffer Testability 

The diagnostic user can invalidate the entire TB array by executing an MTPR instruction which 
addresses the TBIA IPR. This operation will also reset the NLU pointer. The user can invalidate 
any virtual page address which may cached in the TB by executing a MTPR addressing the TBIS 
IPR. 

The diagnostic user can explicitly query the TB to determine if a given tag is validated and 
stored in the TB. This is accomplished by addressing the Translation Buffer Check IPR through 
the MTPR instruction. 

Every TB entry can be explicitly filled and validated by the diagnostic user through the use of the 
TB_TAG_FILL and TB_PTE_FILL commands. The entry on which these two commands operate 
at any given time is addressed by the NLU pointer. The NLU pointer is a round robin pointer 
which increments when a TB_PTE_FILL is executed or when a tag match is detected on the entry 
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which the NLU pointer is currently pointing to. The NLU pointer is reset to point to the 0th 
entry whenever a TBIA command is executed. 

12.16.3.2 Pcache Testability 

Every bit in the Pcache can be read and written by the user through DREAD, WRITE, IPR_RD 
and IPRJWR operations. Pcache is accessed by DREADs and WRITEs. All other bits (tag, valid 
bits and parity bits) are accessed through Mbox IPRs. 

The operational mode of the Pcache can be changed to accomodate testing the array. The mode 
is controlled by the Pcache Control Register (PCCTL) which can be read and written as an Mbox 
IPR. The PCCTL allows the user to: 

1. Enable/disable D-stream and/or I-stream operations to the Pcache. 

2. Allow the Pcache to operate in a direct mapped force hit mode. 

3. Enable/disable Pcache parity checks. 

12.17 Mbox Performance Monitor Hardware 

Hardware exists in the Mbox to support the NVAX Performance Monitoring Facility. See 
Chapter 16 for a global description of this facility. 

The Mbox hardware generates two signals, M%PMUX0 and M%PMUXl, which are driven to the 
central performance monitoring hardware residing in the Ebox. These two signals are used to 
supply Mbox performance data for the purpose of recording performance statistics. Seven Mbox 
performance monitoring functions exist. The function to be executed is specified by the PMM 
field of the PCCTL register. 



Table 12-15: Mbox Performance Monitor Modes 



PCCTL<7s5> 


Performance Monitor' Mode 


000 


TB hit rate for P0/P1 Space I-stream Reads 


001 


TB hit rate for P0/P1 Space D-stream Reads 


010 


TB hit rate for SO Space I-stream Reads 


011 


TB hit rate for SO Space D-stream Reads 


100 


Pcache hit rate for I-stream Reads 


101 


Pcache hit rate for D-stream Reads 


110 


illegal mode-Results are UNPREDICTABLE 


111 


ratio of unaligned virtual reads and virtual writes to total virtual reads 




and virtual writes 
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Who 


When 


Description of change 


Bill Wheeler 


8-May-1990 


Other tweaks 


Bill Wheeler 


27-Feb-1990 


Add perf monitor hardware. Other tweaks 


Bill Wheeler 


15-Jan-1990 


Signal name change 


Bill Wheeler 


20-Nov-1989 


Final Changes prior to review for Rev 1.0 Release 


Bill Wheeler 


23-Aug-1989 


More Updates 


Bill Wheeler 


31-Jul-1989 


Spec Update 


Bill Wheeler 


06-Mar-1989 


For External Release 


Bill Wheeler 


30-Nov-1988 


Initial Release 


Gil Wolrich 


15-Nov-1990 


NVAX Plus External Release 


Gil Wolrich 


l-Aug-1991 


update 
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Chapter 13 
NVAX Plus CBOX 



13.1 Functional Overview 

The NVAX Plus and NVAX processors contain common IBOX, EBOX, FBOX, and MB OX internal 
functionality. The NVAX external interface is to a backup cache and I/O NDAL bus, while the 
NVAX Plus external interface is a common cache/memory bus used by EV processors. While the 
MB OX interface section of the CBOX is similar for NVAX and NVAX Plus, the EDAL bus interface 
sections of NVAX Plus replace the TAG, DATA, and NDAL/BIU sections of the NVAX CBOX. 

The NVAX Plus CBOX receives read, and write requests from the MB OX. The CBOX initiates 
bus cycles and sends fill data to the MB OX. Invalidates are initiated by external logic and sent 
to the MBOX under CBOX control. 

For reads the tag and data stor es are read together. If the tag matches and the valid bit is set the 
associated data is returned to the MBOX. If the read misses a READJBLOCK request is sent to 
the system logic. NVAX Plus waits for the system to update the cache and deliver the requested 
date to a 32 byte Input Buffer. 

If NVAX Plus is not in "FV" mode writes require a probe cycle in which the tag and state bits are 
read. If the probe indicates a tag match for a valid block which is not shared, then NVAX PLUS 
writes the data store. If the write probe indicates a miss or the block is shared, NVAX Plus sends 
a WRITE_BLOCK command to the system logic. The WRITE_BLOCK command has an eight bit 
longword mask associated with it indicating the longwords which are to be updated. The write 
data is placed in a 32 byte Output Buffer. The write is completed under external control. 

If NVAX Plus is in "PV" mode a WRITE.BLOCK command is initiated and the Bcache is 
not probed. The cWMask_h lines contain byte mask rather than longword mask information. 
dataWE<l:0>, and dataA_h<3> also supply additional information in order to construct 16 byte 
enables. <endmask> 

For a NVAX Plus EDAL bus system; 

• Only one miss can be issued, the cache can not be used till the miss completes 

• The external logic is responsible for writebacks 

• The external logic must maintain cache coherence for both backup and primary caches 
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A Valid, Dirty, and Shared bit are associated with each tag in the external backup cache. The 
Valid and Shared bits are written by external system logic only. When not in "PV" mode the 
Dirty bit is written by NVAX. Plus on write hits to a non-shared block and indicates the data in 
cache is no longer the same as main memory. For Writes to Shared blocks NVAX Plus can not 
write directly into the cache, and must issue a WRITE_BLOCK command to enable the external 
system logic to broadcast the shared write to all caches in the system. 

13.2 CBOX REGISTERS 

13.2.1 BIU_ADDR 

This read-only register contains bits [31. .5] of the physical address associated with any errors 
reported in BIU_STAT[7..0]. The BIU_ADDR is locked against further updates, until the error 
bits of BIU_STAT are cleared. 



Figure 13-1: BIU_ADDR 



31 30 2S 28 2" 26 25 24 23 22 21 20 IS 18 17 16 IE 14 13 12 11 10 S 6 7 6 5 4 3 2 1 0 
I Bit 1 ADDRI31..5] IX X X X X | 



13.2.2 BIU_STAT 

The BIU.STAT is a WRITE-ONE-TO CLEAR W1C IPR. When one of BIU.HERR, BIU_SERR, 
BC.TPERR or BCJTCPERR is set, BIU_STAT[6..0] are locked against further updates, and 
the address associated with the "error is latched and locked in the BIU_ADDR register. BIU_ 
STAT[7..0] and BIU.ADDR are unlocked when the BIU_STAT[7,3:0] are written with l's. 

When FILLJ3CC or BIUJDPERR is set, BIU_STAT[13..8] are locked against further updates, 
and the address associated with the error is latched and locked in the FILL_ADDR register. 
BIILSTAT[14..8] and FILL.ADDR are unlocked when BIU_STAT[14,11:8J are written with l's. 

This register is not unlocked or cleared by reset and needs to be explicitly cleared by Microcode. 
Figure 13-2: BIU_STAT 



Figure 13-2 Cont'd on next page 
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31 30 2& 28 27 26 2l> 24 23 22 21 20 IS 16 17 16 15 14 13 12 11 10 & 6 7 6 5 4 3 2 1 0 



I I I II III I I I I I I I I I I I 

I RO | RO I 0 0 0 0 0 0 0IW1I RO I 0IW1I RO IROfWl |W1 | Wl |W1 I RO |W1 I Wl |K1 | Wl | 
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BXU_KERR 
BIf!_£ERR 

bcjtperf. 

bcTtcperr 

bic_dsp_cmd 

BIU_SEO 

FILL_ECC 

FXLL_CRD 

FILL_DPERR 

FILL_IRD 

FXLL~QW 

FILL^SEO 

FILL_DSF_CMC- 

LOST_WRITE 

BXU_ADDR i 33 : 
FILL ADDR{33 



Table 13-1: BIU STAT 



Name 



Bit(s) Type Description 



BIUJ3ERR 



BIUJSERR 



BC TPERR 



BCTCPERR 



BIU_DSP_CMD 6:4 



BIUSEO 



FILLECC 



WlC This bit, when set, indicates that an external cycle was terminated 
with the cAck_h pins indicating HARDJERROR. 

WlC This bit, when set, indicates that an external cycle was terminated 
with the cAck_h pins indicating SOFTJERROR. 

WlC This bit, when set, indicates that a external cache tag probe encoun- 
tered bad parity in the tag address RAM. 

WlC This bit, when set, indicates that a external cache tag probe encoun- 
tered bad parity in the tag control RAM. 

RO This field latches DSP_CMD[3„1] /dispatch command bits [3...1]/, 
inverting bit [1] if the command is write_unlock, when a BIU_HERR, 
BIU.SERR, BCJTPERR, or BC.TCPERR error occurs, and locks till 
BIU_STAT[7,3:0] are cleared. 

WlC This bit, when set; indicates that an external cycle was terminated 
with the cAck^h pins indicating HARD_ERROR or that a an external 
cache tag probe encountered bad parity in the tag address RAM 
or the tag control RAM while one of BIU.HERR, BIU.SERR, BC_ 
TPERR, or BC.TCPERR was already set. 

WlC ECC error. This bit, when set, indicates that primary cache fill data 
received from outside the CPU chip contained an ECC error. 
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Table 13-1 (Cont.): BIU STAT 



Name 



Bit(s) Type Description 



FILL CRD 



9 W1C 



FILL_DPERR 10 W1C 



FILL.IRD 



FILL.QW 



11 RO 



13:!2 RO 



FILL.SEO 



14 W1C 



FILL_DSP.CMD 19:16 RO 

LOST.WRITE 20 W1C 

BIU_ADDR[33:32] 29:28 RO 

FILL_ADDR[33:32] 31:30 RO 



Corrected read. This bit is only meaningful when FILL.ECC is also 
set. FILL_CRD is Bet to indicate that the ECC error waB correctable 
and clear to indicate that the error was not correctable. 

BIU Parity Error. This bit when Bet, indicates that the BIU received 
data with a parity error from outside the CPU chip while performing 
either a Dcache or Icache fill. FILL_DPERR is only meaningful when 
the CPU chip is in parity mode, as opposed to ECC mode. 

This bit is only meaningful when either FILL.ECC or FILLJDPERR 
is set. FILL.IRD is set to indicate that the error which caused FILL. 
ECC or FILL_DPERR to set occurred during an Icache fill and clear 
to indicate that the error occurred during a Dcache fill and locks till 
BrU_STAT[14,10:8] are cleared. 

This field is only meaningful when either FILL.ECC or FILL. 
DPERR is set. FILL.QW identifies the quadword within the hexa- 
word primary cache fill block which caused the error. It can be used 
together with FILL_ADDR[33..5] to get the complete physical ad- 
dress of the bad quadword. FILL.QW locks till BIU_STAT[14,10:8] 
are cleared. 

This bit, when Bet, indicates that a primary cache fill operation re- 
sulted in either an uncorrectable ECC error or in a parity error while 
FILL.ECC or FILL.DPERR was already set. 

This field latches the DSP.CMD /dispatch command/ which resulted 
in the BIU error and locks till BIU_STAT[14,10:8] are cleared. 

An second error, and command is a write. 

Bits 33,32 of the BIU.ADDR register, should be set only for I/O 
space address. The field is locked against further updates when 
BIU_ADDR[31..5] is locked. 

Bits 33,32 of the FILL.ADDR register, should be set only for I/O 
space address. The field is locked against further updates when 
FILL_ADDR[31..5] is locked. 



FILL DSP CMD<3:0> BIU DSP CMD<2:0> 



DREAD 100X 100 

DR£AD_XO 1010 101 

DREAD LOCK 1100 110 

DREAD~LOCK_IO 1101 110 

I READ 0010 001 

IREAD_IO 0011 001 

WRITE_UNLOCK 0111 011 

WRITE 0110 010 

IO_WRITE 0101 010 

WRITE UNLOCK 10 0001 000 
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13.2.3 FILL_ADDR 

This read-only register contains bits [31..5] ofthe physical address associated with any er- 
rors reported in BIUJ5TAT[14..8]. FILL_ADDR is locked against further updates, till BIU_ 
STAT[14,10:8] are cleared. 

Figure 13-3: FILL__ADDR 



31 30 29 26 21 26 25 24 22 22 21 20 19 18 If 16 15 14 13 12 11 10 S 6 7 € 5 4 3 2 1 0 
I FILL ADDR[31..5] I X X X X XI 
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13.2.4 B!U_CTL 

BIU_CTL is cleared by power-up microcode, except for the "PV" bit which is set to 1 by the 
power-up microcde. 

NOTE 

NOTE: NVAX Plus exits reset microcode with "PV 1 ' = 1, in PV mode. 

NOTE 

NOTE: The BIU_CTL (and DLAG.CTL) registers read inverted values. 
Figure 13-4: BIU_CTL 



31 30 26 26 2" 26 25 24 22 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 6 5 4 3 2 1 0 

: I I I I I I 1 I I I ! I I 

IX XI I I I X XI I I 1 I I 

I I I I I I I 1 I I I 



X X X X X X X 0 0 0 0 0 XI 



I I I +-> BCJSN& 

I I > ECC 

| h •■> OE 

-> BC'FHIT 



-> BC_SPD 

-> PCACHE_MODE 
-> OW_I/0~RD 

-> IC_MAP 
-> BC SIZE 





X bits r«ad values from DIAG_CTL 




Table 13-2: 


BIU Control Register 




Name 


Bit(s) Type Description 





BCENA 



ECC 



OE 



RW External cache enable. When clear, this bit disables the external 
cache. When the external cache is disabled the BIU does not probe 
the external cache tag store for read and write references; it launches 
a request on cReq_h immediately. 

RW When this bit is set NVAX Plus generates/expects ECC on the check_ 
h pins. When this bit is clear NVAX Plus generates/expects parity 
on four of the check_h pins. 

RW When this bit is set NVAX Plus does not assert its chip enable pins 
during RAM write cycles, thus enabling these pins to be connected 
to the output enable pins of the cache RAMs. 
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Table 13-2 (Cont.): BIU Control Register 



Name 



Bit(s) Type Description 



BC.FHIT 



BCSPD 



BC__WE_CTL 

PCACHE_MODE 
QWJO.RD 

••PV' 



IO..MAP 
BC.SIZE 

BC.PAJDIS 
WSJO 



3 RW External cache force hit. When this bit is set and BC_EN is also 

set, all pin bus READ.BLOCK and WRITE.BLOCK transactions 
are forced to hit in the external cache. Tag and tag control parity 
are ignored when the BIU operates in this mode. BC_EN takes 
precedence over BC_FHIT. When BC_EN is clear and BC.FHIT is 
set no tag probes occur and external requests are directed to the 
cReq_h pins. 

5:4 RW,0 External cache speed. This field indicates to the BIU the read and 
write access time of the RAMs used to implement the off-chip ex- 
ternal cache, measured in CPU cycles. BCache speeds of 2,3, or 4 
times the CPU_clk are available. The cache speed field is hardware 
reset to the 2X cpu cycle setting. 

NVAX Plus replaced BC_RD_SPD and BC_WR_SPD with BC_SPD. 
NVAX Plus uses the BC_SPD field to program the read and write 
cache access time. EVAX allows the read and write cache access 
times to be programmed separately. BC_SPD is initialized on reset 
to the 2X cpu cycle setting. 

- RW External cache write enable control. This field is used to control the 

timing of the write enable and chip enable pins during writes into 
. the data and tag control RAMs. This field will be set to a fixed value 
for NVAX Plus. This field is programmable on EVAX. 

8 RW When this bit is clear the Pcache is allocated as a two way set asso- 

ciative, and when set the Pcache allocates as direct mapped. 

9 RW When this bit is set IO_SPACE DREADs which are not quad- 

word aligned return data from an internal register which contains 
bits<63:32> of the previous quadword aligned read. 

10 RW Set for low cost workstations. Byte parity on reads, cWMask[5] is 

addr[2] on reads, check bits remain tri stated on writes, all writes 
are done as if the Bcache is disabled, cWMask[7..0], dataA_h[3], 
dataWE_h[1..0] contain byte mask info for writes. The "FV" field is 
hardware set to "PV" mode at reset. System other than "PV" must 
clear BIU_CTL<"FV"> from SROM code before executing external 
reads or writes. 

14:13 RW These bits are driven to Adr_h[33:32] on 10 references, allowing 
different systems to select the range for 10 mapping. 

30:28 RW This field is used to indicate the size of the external cache. BC_ 
SIZE is not initialized on reset and must be explicitly written before 
enabling the external cache. See Table 13-4 for the encodings. 

Thi s field 
has been 
removed 
on NVAX 
Plus. 

RW 



31 



This bit, when set, indicates that IO-space is mapped for "FLAMINGO' 
work stations. 
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"PV" systems maintain a write-through cache with byte parity. The cache is not written by 
NVAX Plus, all writes and byte/word writes issue a WRITE BLOCK to the system. The LW 
parity generated be NVAX Plus is not used for "PV writes. 

If BIU_CTL<"PV> b '1, check_h<27:0> output drivers remain tristated at all times, allowing the 
system parity generator logic to drive parity into the Bcache during write_block and STxC cycles. 
check_h[27:25, 20:18, 13:11, 6:4] are not used and need to be driven. 

System logic constructs a byte enable for each of the 16 possible bytes from cWMask<7:0>, 
dataA<3>, and dataWE_h<l'0>, and generates byte parity. Fast external reads are executed 
for read hits, with byte parity driven to the check bits. 

For BIU_CTL<"FV> = '1, writes do not probe Bcache. Writes go directly to WRITE JBLOCK, 
and output byte mask on cWMask<7:0>. dataA<3> identifies the QW to which the cWMask lines 
apply, and dataWE_h<l:0> output byte mask information for the other QW of data. 



dataA_h<3> dataWE_h<i r 0> bytemask<15: 8> bytemask<7 : 0> 



0 00 00000000 cWMask<7:0> 

0 01 00001111 cWMask<7:0> 

0 10 11110000 cWMask<7 : 0> 

0 11 11111111 cWMask<7:0> 

1 00 cWMask<7 : 0> 00000000 
1 01 cWMask<"':0> 00001111 
1 10 cWMask<7 : 0> llirOOOO 
1 11 eWMask<7 : 0> 11111111 
1 11 cWMask<7:0> 11111111 



Reads probe the Bcache, byte parity is input as 

•check_h[0] for data ["7:0], check_h[l] for data 115: 8] , check h[2] for data [23:16] , check_h[3] for data [31:24 

check~h[7] for data [39:32], check~h[8] for data [47 : 40], cheek~h[9] for data [ 55 : 46] , ch«ck~h[10] f or .data [ 63 : 5€ 

ch«ck_h[14] for data [71: 64], check~h[15] for data [79:72] , ch«ck~h[16] for data [87 : 80] , check~h[17] for data [95: 86 

check_h[21] for data [103 : 96] , check~h[22] for data [ 111: 104] , checkji[23] for data [119 :112 ] , check~h[24] for data [127; : 

where check_h[3:0] are xored to produce the LW parity bit for data [31:0], 
check_h [10: 7] j are xored to produce the LW parity bit for data [63: 32], 
check_h [17 : 14 ] are xorec to produce the LW parity bit for data[95:64j, 
check_h [24:21] are xored to produce the LW parity bit for data[127:96] 

The data WE lines are only used for mask information in "PV mode. 



Table 13-3: 


BC_SPD 


BIU.SPD 


DRV_CLK/Cache Speed 


00 


2X cpu cycle 


01 


3X cpu cycle 


10 


4X cpu cycle 
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Table 13-4: BC SIZE 



BC„SIZE 


Size 


0 0 0 


128 Kbytes 


0 0 1 


256 Kbytes 


0 10 


512 Kbytes 


0 11 


1 Mbytes 


10 0 


2 Mbytes 


10 1 


4 Mbytes 


110 


8 Mbytes 



13.2.5 DlAG_CTL 

DIAG_CTL is cleared by power-up microcode, except for the DISABLE_E C C_E RROR bit which 
is set to 1 by the power-up microcde. 

NOTE 

NOTE: NVAX Plus exits reset microcode with DISABLE_ECC_ERR = 1. System soft- 
ware must clear DIAG_CTL<DISABLE_ECC„ERR> to enable ECC/parity checking. 

NOTE 

NOTE: The BIU_CTL (and DIAG„CTL) registers read inverted values. 



Figure 13-5: DIAG_CTL 



31 30 2& 28 2" 26 25 24 23 22 21 20 IS 18 17 16 15 14 13 12 11 10 S 8 7 6 5 



2 10 



I x x x xi i 

i I i 



I i i i I i ill i 

10 0 0 0 01 IX XI I IXXXI I IXXXXXXI 

I I I I I I III I 



■> TODR_TEST 
-> TODR~INC 



■> PACK_DXSABLE 
-> MAB EN 



-> DISABLE ECC ERF. 



-> PK KIT_TYPE 
-> PK~ ACCESS TYPE 



■> SVJ ECC 



X bits read values from BID CTX 
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Table 13-5: Diagnostic Control Register 

Name Bit(s) Type Description 

Enables TODR test mode. 
Increment TODR for test purposes. 

Diagnostic feature to disable write packing, except for QW packing 
directed by microcode. 

Diagnostic feature to allow tagAdr[33:32] to output MAB[7:6] and 
tagAdr[17,18,19] to output MAB[10:8] depending on Bcache size. 
This bit is cleared at reset to insure tagAdr[33:32] and tagAdr[17,18.19] 
are not driven unless enabled by software. 

The reporting of ECC/Data Parity errors is disabled when set. 

Selects Bcache tag compare type for Performance Monitor selection 
ofC%PMUXl. 

Selects Bcache tag compare type for Performance Monitor selection 
of C%PMUX0. 

This bit, when set, enables the use of ECC check bits from IPR_ 
BEDECC as given by software for write data. If DIAG_CTL[1] ~ 
'0, i.e. parity mode if SWJECC is set BEDECC[0] is the parity bit 
generated for data[31:0} and BEDECC[7] is the parity bit generated 
for data[63:32]. 



NOTE 

NOTE: NVAX Plus does not support BAD_TCP, the write bad tag control parity function 
which is implemented by EV4. 

1 3.2.6 FlLL_SYNDROME 

The FILL_SYNDROME register is a 14-bit read-only register. If the chip is in ECC mode and 
an ECC error is recognized during a primary cache fill operation, the syndrome bits associated 
with the bad quadword are locked in the FILL.SYNDROME register. The FILL_.SYNDROME 
register is locked against further updates, till BIU_STAT[14,10:8] are cleared. 

Figure 13-6: FILL_SYNDROME 



31 30 2fr 26 27 26 25 24 22 22 21 20 19 18 17 16 15 14 13 12 11 10 S 8 7 6 5 4 3 2 1 0 



00 0 0000000000000001 HII6..0] I LO[6..0] 



TODR.TEST 6 RW 

TODR_INC 7 RW 

PACK.DISABLE 11 RW 

MAB.EN 12 RW,0 



DISABLE.ECC. 15 RW,1 
ERR 

PM.HXTJTYPE 23:21 RW 

PM_ACCESS_TYPE 26:24 RW 

SW ECC 27 RW 
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Table 13-6: Fill Syndrome 

Name Bit(s) Type Description 

LO 6:0 RO The LO field latches the ECC syndrome bits for the low longword. 

HI 13:7 RO The 

HI field 
latches 
the ECC 
syndrome 
bits for 
the high 
longword. 

13.2.7 BEDECC 

The BEDECC register is a 14-bit write-only register. If BIU_CTL[£W_ECC] = 1 the check bits 
for write data are sourced from BEDECC instead of the normal check bit generation logic. 

Figure 13-7: BEDECC 



31 30 2& 28 27 26 25 24 23 22 21 20 16 IE T 16 15 14 13 12 11 10 9 6 7 6 E 4 3 2 1 0 



HI[6..0] | LO[6..0] 



Table 13-7: 


BEDECC 








Name 


Bit(s) 


Type 


Description 




LO 
HI 


6:0 
13:7 


WO 

wo 


The LO field for check bits of data[31:0]. 
The HI field for check bits of data[63:32]. 





13.2.8 BCJTAG 

The BCJTAG is a read-only IPR. Unless locked, the BCJTAG register is loaded with the results 
of every backup cache tag probe. When a tag or tag control parity error or primary fill data 
error (parity or ECC) occurs this register is locked against further updates. Software may read 
this register by using the MFPR instruction. The BCJTAG register is unlocked when the BIU_ 
STAT[7,3:2] are cleared. 

The BC_TAG register for NVAX Plus stores the tag error information in different bit positions 
then EV4, maintaining the alignment of the tag in the address data path. BC_TAG<17:22> 
are used depending upon the BIU_CTL<BC_SIZE> field specifying the Bcache size. BC<TAG_ 
MATCH> indicates the address and TAG fields for the BC^SIZE were equal. 
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31 30 25 26 27 26 25 24 23 22 21 20 16 16 17 16 15 14 13 12 11 10 9 8 7 6 £ i 3 2 1 0 



TAG [31.. 17] 



I I I I I I I I 
|RO|ROIRO|RO|RO|ROI 0000000 0 0001 
I I I I I I I I 



I I I I 

I | | + > TAG MATCH 

| | + > TAGCTL_V 

I + > TAGCTL_D 

* > TAGCTL~£ 

> TAGCTL~P 

... > TAG P ~ 



13.2.9 STxC_RESULT 

Bit 2 of STxC_RESULT, STxC P/F is read only. **When a write is issued to this IPR address 
AC(hex) the IREAD latch lockout as a result of a failed READ LOCK is cleared.** Bit 2 is set if 
the last store conditional failed, and is reset if the last store conditional did not result in a STxC 
FAIL. This register is read by microcode following write_unlocks to determine if the write was 
successful. Bits [1:0] must be read as zero. 



Figure 13-9: STxC_RESULT 



31 30 29 28 27 26 25 24 23 22 21 20 19 16 17 16 15 14 13 12 11 10 6 6 7 6 5 4 2 2 1 0 

|0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| RO I 0 I 0 1 



I I 

I +-read as z«ro 
+- read as zero 

-- STxC P/F 



13.2.10 SIO 

Bit 0 is read-only. The level of the serial line/SROM INPUT data input pin is read. Bit 1 is 
write_only and drives to the serial line output/SROM CLOCK output pin. The level driven to the 
pin is inverted from that written to the SIO register. 



Figure 13-10: SIO 



Figure 13-10 Cont'd on next page 
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32 30 26 28 27 26 25 24 23 22 21 20 16 16 17 16 15 14 13 12 11 10 9 6 7 6 5 4 3 2 1 0 

t. (-— + +— +--+-« f— +—4—-+--+ — ^— +—+--+--+- — (.—+»-+--+—+ h- -+--+--+--+ +•—+--+—+ i— — r— -+ 

I 0 0 00000000 0 0 0 000000000000000001 I I 



I t— serial line in 
+— — serial line out 



13.2.11 SOE-IE 

Bit 0 is write only and drives the SROM_OE pin. Bit 1 is read only and receives the icMode_h<0> 
(SROM_FAST) pin latched at the trailing edge of reset_l which determines if a SROM is to be 
read. Bits 22 to 20 are read only and are coded with the wafer column position. Bits 26 to 23 are 
read only and are coded with the wafer row position. Bits 31 to 27 are read only and are coded 
with a Wafer ID number. 



Figure 13-11: SOE-IE 



31 30 26 28 27 26 25 24 23 22 21 20 16 18 17 16 15 14 13 1:2 11 10 6 8 7 6 5 4 3 2 1 0 

I iooooooooooooooooooiii 



I +- SROK_OE 

+ SROK_FAST 

» WAFER/ ROW /COI ID 



13.2.12 QW_PACK 

This is a write only ipr used b}^ microcode to inform the WRITEJPACKER to pack the next two 
LW writes even if the address is in io space or the command is a WRITE JUNLOCK. The IPR_WR 
takes place during a MTPR MAILBOX instruction and a MTPR QW_PACK(B8) instruction to 
produce QW writes to IO space, 

13.2.13 CLRJO_PACK 

This is a write only ipr used by microcode to inform the WRITEJPACKER to clear the quadword 
pack state. The IPRJWR takes place during a MTPR MAILBOX instruction and a. MTPR CLR_ 
IO_PACK(B9) instruction. 
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1 3.2.1 4 CONSOLE HALT/CHALT 

This R/W register contains the start address for the console. It is written by system software, 
and used to determine the'console start physical address in response to a HALT interrupt. 

NOTE 

NOTE: If the console code resides in 10 space, a full quadword of data must be received 
for each RE AD_ BLOCK 

1 3.2.1 5 Time-of-Day Register (TODR) 

The Time-of-Day Register forms an unsigned 32-bit binary counter that is driven from a 100Hz 
oscillator, so that the least significant bit of the clock represents a resolution of 10 milliseconds. 
The R/W register counts only when it contains a non-zero value. 

Figure 13-12: Time of Day Register, TODR 



31 30 2i 28 2? 26 25 24 23 22 21 20 IS 16 11 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 

initial value plus number of 10-millisecond units since setting I : TODR 



13.2.16 Programmable Interval Clock 

The interval clock provides an interrupt at IPL 16 (hex) at programmed intervals. The counter is 
incremented at 1 microsecond intervals, with at least .01% accuracy. The interval clock consists 
of three registers in the privileged register space. 

1. Interval Count Register (ICR) - The interval count register is a read only register incremented 
every microsecond. Upon a carry out (overflow) from bit 31, it is automatically loaded from 
NICR and an interrupt is generated if the interrupt is enabled. That is, the value of ICR on 
successive microseconds will be FFFFFFFD (hex), FFFFFFFE, FFFFFFFF, <value of NICR>. 

2. Next Interval Count Register (NICR) - This reload register is a write only register that holds 
the value to be loaded into ICR when ICR overflows. The value is retained when ICR is 
loaded. 

3. Interval Clock Control Status Register (I CCS) - The I CCS register contains control and status 
information for the interval clock. 

The interval clock consists of 3 Internal Processor Registers configured as follows: 
Figure 13-13: IOCS 



Figure 13-13 Cont'd on next page 
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Figure 13-13 (Cont.): I CCS 



31 30 2<i 26 27 26 25 24 23 22 21 20 1& IS 17 16 15 14 13 12 11 10 9 B 7 6 5 4 2 2 1 0 

— 4— 4— -1— — r~+--+— .(.—+—.+»-+—+—■ +— +-—+--+--*—-+--+—-+— -1— -•!——!— -+—4— ^ *— »— 4 

II I I I I I I I 

|WC 1000000 0 000000000 000000 0|VJC|RW|WO|WO| 0 0 0|RW| 

II I I I I I I I 



I I I 
I I I 



(— RUK 

— XFR 

— SGL 

— IE 
-- INT 

— ERR 



13.2.17 Interval Clock Control Register 

When bit <0>, the RUN bit, is a 1, the Interval Count Register is incremented once per microsec- 
ond. When clear, ICR does not- increment automatically. RUN is cleared during reset. 

Bits <3:1>, Must be zero. 

Writing a 1 to bit <4> (XFR) generates a pulse which causes the Next Interval Count Register 
to be copied to the Interval Count Register. XFR does not require clearing; Multiple XFRs will 
produce multiple transfers. XFR is always read as 0. 

When RUN is a 0, writing a 1 to bit <5> (SGL) generates a pulse which causes the Interval Count 
Register to be incremented by 1. If SGL is written and RUN is a 1, or XFR is written at the same 
time, the the result is unpredictable. SGL does not require clearing; Multiple SGLs will produce 
multiple increments. SGL is always read as 0. 

When Bit <6> IE is set, an interrupt request is generated every time ICR overflows (every time 
Interrupt is set). When clear, no interrupt is requested. Similarly, if Interrupt is already set and 
the software sets Interrupt Enable, an interrupt is generated. That is, an interrupt is generated 
whenever the function [Interrupt Enable AND Interrupt] changes from 0 to 1. Interrupt Enable 
is cleared by reset. 

Whenever the Interval Count Register overflows, bit <7> (INT) is set. If IE is set when INT is 
set, an interrupt is posted. For the case in which the NICR contains a value of FFFFFFFF and 
the ICR overflows, consecutive interrupts are not posted. 

Whenever the Interval Count Register overflows and INT is already set, ERR (bit <31>) is set. 
Thus, ERR indicates a missed overflow. 

Reset clears ICCS <6> and <0>, and leaves the rest of ICCS unpredictable. 



Figure 13-14: ICR 
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Figure 13-14 (Cont.): ICR 



31 30 2S 28 21 26 25 24 23 22 21 20 IS 16 17 16 15 14 13 12 11 10 6 6 1 6 5 4 3 2 1 0 

INTERVAL COUNT 
Interval Count Register Read Only 



13.2.18 Interval Count Register 

This read-only register contains the interval count. When the RUN bit is a zero, writing a 1 to 
SGL increments the register. When RUN is a 1, the register is incremented once per microsecond. 
When the counter overflows, the INT bit is set, and an interrupt is posted if IE is set. The register 
is then loaded from the Next Interval Count Register and continues incrementing. The maximum 
delay that can be specified is approximately 1.2 hours. 

Figure 13-15: NICR 



31 30 2S 26 2" 26 25 24 23' 22 21 20 15 16 1*7 16 15 14 13 12 11 10 & 8 1 6 5 4 3 2 1 0 

NEXT INTERVAL COUNT 
Next Interval Register Write Only 



13.2.19 Next Interval Count Register 

This contains the value which is loaded into the Interval Count Register after an overflow, or in 
response to a 1 written to XFR. 

The Interval Count Register is cleared by reset. 

To use the Interval Clock, load the negative (2's complement) of the desired interval into the Next 
Interval Count Register. Then, writing 51 (hex) to the I CCS will enable interrupts, load the Next 
Interval into the Interval Count Register, and set the RUN bit. An interrupt will then occur 
every "interval count" microseconds. The interrupt routine should write CI (hex) to°the ICCS to 
clear the interrupt. If Interrupt has not been cleared (the interrupt has not been handled) by the 
time of the next ICR overflow, Error will be set. 

If NICR is written while the clock is running, the clock may lose or add a few ticks. If the interval 
clock interrupt is enabled, this may cause the loss of an interrupt. 

13.3 Cache Organization 

Pins for tagAdr_h<31:17> are allocated allowing the cache size to be as small as 128 Kb. The BC_ 
SIZE field of the BIU_CTL register determines which bits of tagAdr_h<22:17> are to be includes 
in the match comparison. 
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NVAX Plus cache cycle are 2,3, or 4 times the internal cpu_clk cycle time. ISSUE: SET BY IRQ 
AT RESET OR IN BIU.CTL. 

13.4 Cache_Speed and SYS_CLK 

NVAX Plus cache accesses are 2,3, or 4 times the CPU_CLK period. 

Transactions requiring system logic intervention are referenced to SYS_CLK which is separately 
programable, also at 2,3, or 4 times the CPU.CLK period. For systems in which cache_speed 
and SYS.CLK are both 2 times the CPU Cycle, SYS.CLK lags the cache access by one CPU cycle 
allowing the fastest transfer of commands to the system. 

13.5 DataPath 



Table 13-8: Cbox Queues and Major Latches 


Queue/Latch 


Entries Address/Data 


Function 


CM_OUT_LATCH 


1 


Addr<3 1:3 >,data<63 :0> 


Holds fill data or an invalidate address 
being Bent to the Mbox. 


FILL_DATA„PIPEs 


2 


Data<63:0> 


Pipeline data destined for the Mbox. 


DREAD.LATCH 


1 


Addr<33:3> 


Holds a data-stream read request from 
the Mbox. 


IREAD.LATCH 


1 


Addr<33:3> 


Holds an instruction-stream read request 
from the Mbox. 


WRITE_PACKER 


1 


Addr < 33 :3 > ,data<63 :0> 


Compresses sequential memory writes to 
the same quadword. 


WRITE_QUEUE 


8 


Addr<33:3>,data<63 :0> 


Queues write requests from the Mbox. 


INVADR.LATCH 


1 


iAddr<12:5> 


Holds address for Pcache invalidates. 


INPUT.LATCH 


2 


Data<127:0> 


Holds input data from the BD JDATA bus. 


OUTPUT_LATCH 


1 


Data<127:0> 


Holds output data to be driven onto the 
BD.DATA bus. 
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13.6 Mbox Interface 

All NVAX Plus CPU chip transactions for the Cbox arrive through the Cbox-Mbox interface. 
Reads come from the Mbox to the Cbox through the read latches. Writes arrive through the 
WRITE_PACKER and the WRITE.QUEUE. All fills returning from the Cbox to the Mbox go 
through the CM_OUT_LATCH. 

A block diagram of the Mbox interface is shown in Figure 13-16. 
Figure 13-16: Mbox Interface 



C%CBOX_ADDR_H<31:5> 

(INVALS) <fV. 

C%MBOX_FIU|l_OW_H<4:3> 



CM ADDR LATCH 



B%S6 DATA H<63:0> 



FILL DATA PIPE2 



FILL DATA PIP E 1 



CM DATA LATCH 



7ft CM OUT LATCH "T* 



DREAD LATCH 



C BUSV.DBUS H<63:0> 



C ADC%ABUS H<31:0> 



MV.S6 PA H<31:3>, M%C S6 PA H<2:0> 



IREAD LATCH 



WRITE PACKER 



WRITE.OUEUE 
S ENTRIES 



When the Mbox has a command for the Cbox, the command appears on M%S6_CMD<4:0>. 
M%CBOX_REFJENABLE or M%CBOX_LATE_EN_H is asserted for all reads, IPR_RDs, and 
IPRJWRs. M%CB OX_LATE_EN_H is only used for transactions which may hit in the Pcache 
(DREADs, IREADs, and READ MODIFYs). Neither M%CBOX_REFJSNABLE or M%CBOX_ 
LATE_EN_H are asserted for writes since the Cbox accepts all writes from the Mbox. The Cbox 
loads the address from M%S6_PA<31:3> into either the IREAJD_LATCH, the DREAD_LATCH, or 
the WRITE.PACKER. If the command is a write, the Cbox loads the data from B%S6_DATA and 
the byte enable from M%S6JBYTE_MASK into the WRITE_ PACKER. 

Table 13-9 shows the commands which pass between the Mbox and the Cbox. 
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Table 13-9: Mbox-Cbox Commands 

Command Description Cbox datapath element involved 



Mboz to Cbox commands driven on M%S6_CMD<4:0> 



IREAD 1 


Instruction stream read 


IREAD.LATCH 


DREAD 1 


Data stream read 


DREAD.LATCH 


DREAD.MODIFY 1 


Data stream read with modify 


DREAD.LATCH 




intent 




DREAD.LOCK 1 


Interlocked data stream read 


DREAD.LATCH 


WRITE.UNLOCK 


Write which releases lock 


WRITE. PACKER , WRITE.QUEUE 


WRITE 


Normal write 


WRITE.PACKER, WRITE.QUEUE 


IPR.RD 1 


Read of an internal or exter- 


DREAD.LATCH 




nal processor register 




IPR.WR 1 


Write of an internal or exter- 


WRITE_PACKER , WRITE.QUEUE 




nal processor register 




Cbox to Mbox commands driven on C%CB0X_CMD<1K>> 


D_CF 


Data stream cache nil 


CM„OUT_LATCH 


I_CF 


Instruction stream cache fill 


CM_OUT_LATCH 


INVAL 


Hexaword invalidate 


CM.OUT.LATCH 


NOP 


No operation. 





l Qualified by M %CBOX_REF„ENABI J£ or M%CBOX_IATE_EN_H. 



13.6.1 The IREAD J-ATCH and the DREAD_LATCH 

When the Mbox has a read command for the Cbox, the Cbox loads the address from M%S6_ 
PA<31:3> into either the depending on the command. If M%S6_PA<31:29> * 111 IREAD_LATCH 
or DREAD.LATCH bits<33:32> are set to 11, else they are set to '00. Only IREADs are loaded 
into the IREAD.LATCH. The DREAD.LATCH is used for DREAD, DREAD.MODIFY, DREAD_ 
LOCK, and IPR.READ. 

The Mbox only has one outstanding IREAD and one outstanding DREAD at a time, so no back- 
pressure for the latches is needed. When the DREAD.LATCH is valid, the Mbox does not start 
the next DREAD-type transaction until all fill data from the previous command is returned to the 
Mbox. When the IREAD.LATCH is valid, the Mbox does not start the next IREAD transaction 
until either the IREAD has been aborted or all fill data from the IREAD is returned to the Mbox. 

Table 13—10 and Table 13—11 show the fields which are contained in the two read latches. 
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Table 13-10: IREAD LATCH Fields 



Field 


Purpose 


ADDRESS<31:0> 

CMD<4:0> 

SET.NUMBER 


Physical address of the read request 

Specific command being done (IREAD). 

Set to which this fill is to be allocated in Pcache. 


Table 13-11: DREAD. 


.LATCH Fields 


Field 


Purpose 


ADDRESS<31:0> 
CMD<4:0> 

SET.NUMBER 


Physical address of the read request. 

Specific command being done (DREAD, DREAD.MODIFY, DREAD.LOCK, 
IPR.READ). 

Set to which this fill is to be allocated in Pcache. 



When the Mbox asserts M %AB O RT_ CB OX.XRD , the Cbox clears the IREAD_LATCH entry if 
the reference has not yet started. If the CBOX starts the IREAD sequence before Mbox asserts 
M%ABORT_CBOX_IRD the sequence is continued but data is not sent to the MBOX. 

13.6.2 WRITE_PACKER and WRITE_QUEUE 

Writes from the Mbox go through the WRITE.PACKER and into the WRITE.QUEUE. The 
WRITE.PACKER holds a quadword of data; the WRITE.QUEUE consists of 8 entries, each 
of which contains a quadword of data. The purpose of the WRITE.PACKER is to accumulate 
writes to the same quadword which arrive sequentially, so that only one write has to be done into 
the cache. 

A WRITE command with an non I/O space address or a WRITE or WRITE.UNLOCK to an 
I/O space address preceeded by an IPR.WR to the QW.PACK ipr are packed. The IPR Writes 
which set and clear QW_PACK are not put into the WRITE.QUEUE. If the WRITE is to the same 
octaword as the quadword which is presently being packed, the quadword in the WRITE.PACKER 
is placed into the WRITE. QUEUE and the SAME.OCTAWORD bit set in the CMD field. The new 
write reference is loaded into the WRITE.PACKER. If the WRITE is not to the same octaword as 
the quadword which is presently being packed, the quadword in the WRITE.PACKER is placed 
into the WRITE. QUEUE and the SAME.OCTAWORD bit not set in the CMD field. The new 
write reference is loaded into the WRITE.PACKER. Other writes pass immediately from the 
WRITE.PACKER into the WRITE.QUEUE. The WRITE.PACKER is flushed at the following 
times: 

• When a memory-space WRITE to a different quadword arrives. The new. quadword then 
remains in the write packer until a write packer flush condition is met. 

• When a WRITE UNLOCK arrives. The WRITE.UNLOCK is then passed immediately from 
the WRITE.PACKER to the WRITE. QUEUE. 

• **When an I/O space write arrives. If QW.PACK the next two longwords are packed into 
a QW entry. QW.PACK is set by an IPR.WR issued by microcode to inform the WRITE. 
PACKER to pack the next two LW writes even if the address is in io space or the command is 
a WRITE.UNLOCK. The IPR.WR takes place during the MOVQ instruction and the MTPR 
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MAILBOX instruction to produce QW writes to 10 space. The QWJPACK clears once the QW 
is loaded into the WRITE_QUEUE. Thus MOVQ to a QW aligned address results in a single 
QW write, and MB.ADDR is written with a high LW of zeroes.** Otherwise the I/O space 
write is passed immediately from the WRITEJPACKER to the WRITE.QUEUE. 

• When an IPR_WRITE arrives. The IPRJWRITE is then passed immediately from the WRITE., 
PACKER to the WRITE_QUEUE. IPRJWRITEs to VLDST are not placed in the WRITE. 
QUEUE. 

• If an IREAD or a DREAD arrives to the same hexaword as that of the entry in the WRITE_ 
PACKER. 

• Whenever the conditions for flushing the write queue are met. 

• If the DISABLEJPACK bit in the CCTL IPR is set. In this case, every write passes directly 
through the WRITE JPACKER without delay unless the QW.PACK IPR is set. 

THREE-CYCLE LATENCY THROUGH THE WRITE_QUEUE 

If the WRITE. QUEUE and the WRITE.PACKER are empty, the latency of any write 
through them is 3 cycles. The implication of this is that if any reads which flush 
the WRITE„QUEUE are done alternately with writes, their execution will be greatly 
slowed. This applies to IPR reads and writes and may be an issue in testing the chip. 

Table 13-12 describes the fields in the WRITE_QUEUE. 

Table 13-12: WR(TE__QUEUE Fields 

Field Purpose 

VALID Indicates that the entry contains valid information. 

DWR_CONFLICT Indicates that this write conflicts with a DREAD, giving the WRITE_QUEUE 

priority. Check is done using hexaword address. 

IWR_CONFLICT Indicates that this write conflicts with an IREAD, giving the WRITE.QUEUE 

priority. Check is done using hexaword address. 

CMD<2> Same octaword or io_write_unlock. 

CMD<1:0> Specific command being done. 

ADDRESS<31:0> Physical address of the write. 

BYTE_EN<7:0> Byte enable for the write. 

°DATA<63:0> Data to be written. 



The CMD field of the WRITE.QUEUE is encoded as, 

• ipr_write = 00 

• io_write = 01 

• mem_write = 10 

• unlock_write = 11 

• io_unlock_write = 11 and same_ow (cmd<2>=l) 

When a quadword of data is moved into the WRITE. QUEUE , it is serviced by the Cbox arbiter 
as the lowest-priority task, unless special conditions exist. 
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Servicing writes separately from reads allows reads to take higher priority and gets read data 
back to the CPU faster. However, a read which follows a write to the same hexaword must 
not be allowed to complete before the write completes. To prevent this there are conflict bits, 
DWR_CONFLICT<8:0> and IWR_CONFLICT<8:0>, implemented in the WRITE_QUEUE and 
WRITE_PACKER, one for each entry. The conflict bits ensure correct ordering between writes 
and a DREAD or an IREAD to the same hexaword. 

When a DREAD arrives, the hexaword address is checked against all entries in the WRITE. 
QUEUE and WRITE.PACKER. Any entry with a matching hexaword address has its correspond- 
ing DWR.CONFLICT bit set. The DWR.CONFLICT bit is also set if the WRITE.QUEUE entry 
is an IPRJWRITE, a WRITE.UNLOCK, or an I/O space write. If any DWR.CONFLICT bit is 
set, the WRITE.QUEUE takes priority over DREADs, allowing the writes to complete first. 

When an IREAD arrives, the hexaword address is checked against all entries in the WRITE.. 
QUEUE and WRITE_PACKER. Any entry with a matching hexaword address has its correspond- 
ing IWR.CONFLICT bit set. The IWR.CONFLICT bit is also set if the WRITE. QUEUE entry 
is an IPR JWRITE. a WRITE.UNLOCK, or an I/O space write. If any IWR.CONFLICT bit is set, 
the WRITE_QUEUE takes priority over IREADs, allowing the writes to complete first. 

As each write is done, the conflict bits and valid bit of the entry are cleared. When the 
last WRITE.QUEUE entry which conflicts with a DREAD finishes, there are no more DWR. 
CONFLICT bits set, and the DREAD takes priority again, even if other WRITE_QUEUE entries 
arrived after the DREAD. In this way a DREAD which conflicts with previous writes is not done 
until those writes are done, but once those writes are done, the DREAD proceeds. 

The analogous statement is true for an IREAD which has a conflict. If IWR^CONFLICT is set and 
the IREAD is aborted before the conflicting write queue entry is processed, the WRITE.QUEUE 
continues to take precedence over the IREADJLATCH until the conflicting entry is retired. 

If both a DREAD and an IREAD have a conflict in the WRITE.QUEUE, writes take priority until 
one of the reads no longer has a conflict. If the DREAD no longer has a conflict, the DREAD is 
then done. Then the WRITE_QUEUE continues to have priority over the IREADJLATCH since 
the IREAD has a conflict, and when the conflicting writes are done, the IREAD may proceed. If 
another DREAD arrives in the meantime, it may be allowed to bypass both the writes and the 
IREAD if it has no conflicts. 

This mechanism is used for other cases to enforce read/write ordering. Cases where the WRITE. 
QUEUE (and the WRITE.PACKER) must be flushed before proceeding are listed below: 

1. DREAD JLOCK and WRITE.UNLOCK. 

2. All IPR.READs and IPRJWRITEs (includes Clear Write Buffer). 

3. All I/O space reads and I/O space writes. 

4. Dread or Iread conflict with a write (checked to hexaword granularity, on address bits <31:5>). 

When a DREAD.LOCK arrives from the MB OX, DWR.CONFLICT bits for all valid writes in 
the WRITE.QUEUE and WRITE.PACKER are set so that all writes (WRITE.QUEUE entries) 
preceding the DREAD.LOCK are done before the DREAD.LOCK is done. 

When any IPR.READ arrives, all DWR.CONFLICT bits for valid entries in the WRITE.QUEUE 
and WRITE.PACKER are set, forcing the writes to complete before the IPR.READ completes. 
This ensures that IPR reads and writes are executed in order. 

When any D-stream I/O space read arrives, all DWR.CONFLICT bits for valid entries in the 
WRITE.QUEUE and WRITE.PACKER are set, so that previous writes complete first. 
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When any I-stream I/O space read arrives, all IWR_ CONFLICT bits for valid entries in the 
WRITE„QUEUE and WRITEJPACKER are set, so that previous writes complete first. 

Note that when a WRITE.UNLOCK arrives, the WRITE_ QUEUE is always empty as it was 
previously flushed before the READ_LOCK was serviced. 

When a new entry for the DREAD_LATCH arrives, it is checked for conflicts with the WRITE. 
QUEUE. At this time the DWR_CONFLICT bit is set on any WRITE_ QUEUE entry which is 
an I/O space write, an IPRJWRITE, or a WRITE_UNLOCK Similarly, when a new entry for 
the IREAD_LATCH arrives, it is checked for conflicts with the WRITE.QUEUE. At this time 
the IWR_CONFLICT bit is set on any WRITE.QUEUE entry which is an I/O space write, an 
IPRJWRITE, or a WRITEJJNLOCK 

Thus, all transactions from the Mbox except memory space reads and writes unconditionally 
force the flushing of the WRITE. QUEUE. Memory space reads cause a flush if they conflict with 
a previous write. 

13.6.3 I/O Space Writes 

For WRITE commands with M%S6_PA<31:29> not '111, ADDRESS<33:32> = '00. 

For WRITE commands with M%S6„PA<31:29> * '111, ADDRESS<33:32> = BIU._CTL<14:13>. 
The IO_MAP field of the BIU„CTL is set to 01 for FLAMINGO systems, to 10 for COBRA systems, 
and 11 for LASER systems. 

If the QW_PACK ipr is written, the next two longwords are packed to the WRITE.QUEUE, 
otherwise the write is loaded directly. 

13.6.3.1 NON-MASKED FLAMINGO I/O Writes 

Flamingo workstations require I/O space writes to be mapped to channel addresses. For full LW 
writes (non-masked) then if the WSJO bit of BIU_CTL is set with M%S6_PA<31:29> = '111 if 
either BM<3:0> = '1111 or BM<7:4> = '1111 the operation is a NON-MASKED I/O WRITE 

• ADDRESS<31:29> = M%S6_PA<28:26> 

• ADDRESS<28> = '0 if either BM<3:0> = 1111 or BM<7:4> = '1111 ; NON-MASKED I/O 
WRITE 

• ADDRESS<27> = '0 for I/O 

• ADDRESS<26:5> = '0 I M%S6„PA<25:5> 

• ADDRESS<4:3> = M%S6_PA<4:3> 

• Write_ Queue data<63:0> = S6_DATA<63:0> 

• Write_Queue_BM<7:0> = BM<7:0>, sets single LWJMASK bit, longword aligned write 

13.6.3.2 MASKED FLAMINGO I/O Writes 

If the WSJO bit of BIU_CTL is set with M%S6JPA<31:29> = '111 if either BM<3:0> not '1111 or 
BM<7:4> not '1111, a byte or word write to I/O space is required then, the operation is a MASKED 
I/O WRITE. Note that I/O byte/word writes to the upper LW in FLAMINGO systems (i.e. address 
not quadword aligned) are UNPREDICTABLE . 

• ADDRESS<31:29> * M%S6JPA<28:26> 
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• ADDRESS<28> = 1 if NOT (BM<3:0> = 1111 or BM<7:4> = 1111) ; MASKED I/O WRITE 

• ADDRESS<27> = '0 for I/O 

• ADDRESS<26:5> = M%S6_PA<25:5> I '0 

• ADDRESS<4:3> *= M%S6_PA<4:3> 

• Write_Queue data<35:32> = BM<3:0> 

• Write.Queue data<31:0> « S6JDATA<31:0> 

• Write_Queue_BM<7:0> = 1111 1111, sets pair of LW.MASK bits, from M%S6_PA<4:3> 
Thus a QW is written where bit 

bit 32 is th« byt« mask for data<"7:0>, bit 33 is the byte mask for data<15:8>, 
bit 3< is the byte mask for data<32:16>, bit 35 is the byte mask for ciata<31:24> 

13.6.4 MASKED FLAMINGO I/O READS 

If the WS_IO bit of BIU_CTL is set reads to I/O space are mapped in the same manner as 
MASKED I/O Writes. All I/O space reads for FLAMINGO systems are longword reads which 
map to SPARSE 10 space. 

13.6.5 CM_OUT_LATCH 

The CM_OUT_LATCH holds fill data and invalidate addresses which are destined for the Mbox. 
The Mbox never backpressures the Cbox (it can always receive a command from the Cbox) so a 
queue is not needed. The latch has an address portion and a data portion. Hie fields are shown 
in Table 13-13. 

Table 13-13: CM OUT LATCH Fields 



Field Purpose 



CMD<1:0> 


Specific command being done. 


ADDR<12:5> 


PCache Index of the invalidate. This field is not UBed for fills. 


InvReq<l:0> 


PCaehe Set of the invalidate. This field is not used for fills. 


FTLL_QW<4:3> 


Quadword alignment of the fill. This field is not used for invalidates. 


DATA<63:0> 


Fill data. 



The CM_OUT_LATCH is loaded with an invalidate when pInvReq<l:0> is set by system logic. 

The CM_OUT_LATCH is loaded, with fill data when DREAD or IREAD data is obtained by either 
a Fast External Cache Hit or READJ3LOCK. 

The command from the CM.OUTJLATCH is driven on C%CBOX„CMD<1:0>. If the command 
is an invalidate, the address is driven on C%CB0X_ADDR<11:5>, and no data is driven to the 
Mbox. If the command is a fill, the quadword alignment is driven on C %MB OX_FILL_ Q W<4 : 3 > . 
(The Mbox has the hexaword address during these cycles.) Fill data is piped through the FILL_ 
DATA_PIPEs and driven on B%S6_DATA<63:0>. The Cbox calculates byte parity on the fill data 
and drives it on B%S6JDP<7:0>. 
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If an IRE AD is in progress in the Cbox and the MBOX asserts M%ABORT_CBOXJRP, the Cbox 
prevents any further command, address, or data for that Iread from being driven to the Mbox, 
as described in Section 13.6.7. 



Table 13-14: Cbox-Mbox interface control signals 



Field 



Purpose 



C%CBOX_CMD<1:0> 
C%REQ_DQW 



C%LASTJFTLL 
C%CBOX_HARD_ERR 



C%CBOX_ECC ERR 



Specific command being done: either D_CF, I_CF, INVAL, or NOR 

Indicates that the quadword of fill data being returned was the requested quad- 
word of data: the quadword to which the original address corresponded. It is 
also asserted if C%CBOX_HARD_ERR is asserted and the requested quadword 
has not yet been returned; the Mbox then notifies the Ibox and/or Ebox that 
the requested data has been returned so that the machine does not hang. 

Indicates that this is the last data being sent for the read request. 

Indicates that an unrecoverable error is associated with the data. This bit 
.only qualifies fills, not invalidates. When C%CBOX_HARDJERR is asserted, 
the Cbox also asserts C%LAST_FILL as no more fills follow. C%CBOX_HARD_ 
ERR may be asserted as the result of an uncorrectable error in the B cache or 
as the result of RDE on the NDAL. 

Indicates that a correctable backup cache ECC error is associated with the cur- 
rent fill data and the data should be ignored. Valid for fills only, not invalidates. 
Corrected data will follow. 



If an error happens while fill data is being retrieved, the Cbox notifies the Mbox using C%CBOX_ 
HARD_ERR or C%CBOX_ECC_ERR. Table 13-15 shows how both normal cases and error cases 
are handled by the Mbox. 



Table 13-15: Cbox Mbox commands and actions 



C%CBOX_CMD<l :0> 


Qualifiers asserted 


Mbox Action 


NOP 




Take no action. 


LCF 




Accept fill data for outstanding IREAD. 


D_CF 




Accept fill data for outstanding DREAD. 


I_CF or D_CF 


C%CBOX_HARD_ERR, 
C%LAST_FILL 


Perform invalidate, expect no more fills for this 
read. 


I_CF or D_CF 


C%CBOX.ECC_.ERR 


Ignore this fill data, expect fill later. 


INVAL 




Perform invalidate. 


INVAL to outstanding fill 




Perform invalidate, expect fill data. Do not vali- 
date the data in the Pcache when it returns. 



13.6,6 FILL_DATA_PIPE1 and FILL_DATA_PIPE2 

The FILL_DATA_PIPEs are used to pipeline the fill data for two cycles so that the Cbox drives 
B%S6_DATA coincidentally with the write-enable of the Pcache. If there is a free cycle on B%S6_ 
DATA, the Cbox may bypass the fill data from the FTLLJDATA_PIPE1 (to achieve a one-cycle 
bypass). This allows the Mbox to return data to the Ibox or the Ebox one cycle early. The cache 
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fill to the Pcache is done in the normal cycle, driven from FILL_DATA_PIPE2 , even if Ebox or 
Ibox data was bypassed in an earlier cycle. The timing relationships for one cache fill are shown 
in Figure 13-17. 

Figure 13-17: B%S6_DATA bypass timing 



one-cycle data bypass data written to Pcache 
I cycle 1 I cycle 2 I cycle 3 | cycle 4 i 

! 4+44-4 | 4+4+4 | +++++ | ++-++4 | +4+++ | +4+44 | 44444 | +++++ | +++++ | +++++ | 44444 | 4+4+4 | +4-+++ | +++++ | 4.++++ 1 ++-M-4 1 

I A I " I I I 

III I 

I I I B%S6_DATA valid 

I I I (£or~Pcache fill) 

I I B%S6_DATA valid (to MD_BUS) 

I M%CBOX_BYPASS_ENABLE 

C%CBOX_CMD 

C%MBCO_FILL QW<4:3> 



In this example, a fill is just arriving in cycle 1, so the Cbox drives C%CBOX_CMD and C%MBOX_ 
FELL_QW<4:3>. 

The Mbox drives M%CBOX_BYPASS_ENABLE to the Cbox in cycle 2 to indicate that B%S6_ 
DATA is free during the current cycle. This causes the Cbox to bypass data from FILL„DATA„ 
PIPE1 to B%S6_DATA to achieve a one-cycle bypass. 

In cycle 3 the Cbox drives the data from FILL_DATA_PIPE2 to the Pcache for the write. It does 
this even though the bypass was done previously, because the Pcache is always written in the 
third cycle after C%CBOX_CMD is driven with the fill command. 

The rules for the Cbox driving data on B%S6_DATA are as follows: 

1. IF FILL_DATA_PIPE2 contains valid data, drive B%S6JDATA from FILL_DATA_PIPE2 

2. ELSE IF M%CBOXJBYPASS_ENABLE is asserted and FILL_DATA_PIPE 1 contains valid 
data, drive from FILL_D ATA_PIPE 1 to achieve a one-cycle bypass. 

The Mbox keeps enough state to know what the Cbox will be bypassing in any given cycle. 

When the Cbox drives B%S6_DATA, it also generates byte parity and drives B%S6_DP with the 
same timing. 

The fields of the FILL_D ATA_PIPE s are shown in Table 13-16. 

Table 13-16: Fields of FILL_PATA_P1PE1 and FILLJ)ATA_PIPE2 

Field Purpose 

IREAD Indicates that fill data is for an IREAD. 

DATA<63:0> Fill data. 
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The IREAD field is necessary in case of an IREAD abort, as described in Section 13.6.7. If 
M%ABORT_CBOX_IRD is asserted and the data in either FILL_DATA_PIPE 1 or FILL_DATA_ 
PIPE2 is for an IREAD, that FILL_DATA„PIPE must be cleared so that data is not driven back 
to the Mbox. 



13.6.7 IREAD Aborts 

The Mbox asserts the signal M%ABORT_CBOX_IRD to notify the Cbox to abort any IREAD which 
it is currently processing. This may happen because of a branch mispredict where the Istream 
has been prefetching from one branch and has to change over to the other. The Mbox then aborts 
all outstanding IREADs so that a new IREAD can begin. 

When the Cbox receives the abort signal, the read in question may be anywhere in the Cbox read 
sequence. The exact action taken depends on where the read is, as shown in Table 13—17. 



Table 13-17: Cbox Action Upon Receiving M%ABORT_CBOX_JRD 
State of the IREAD Action Taken by the Cbox 



No IREAD outstanding No action taken. 

IREAD^LATCH valid but Clear the IREAD_LATCH so the request will not be started, 
not started 

IREAD in progress Clear the TO_MBOX bit. When the fill data returns, don't send the data to 

the Mbox. 

IREAD fill data in CM_ Clear the entry containing IREAD data so that the data is not returned to the 

OUT_LATCH or FILL_DATA_Mbox. 

PIPEs 



Figure 13-18 shows an example of timing for the Cbox abort response. In cycle 1, M%ABORT_ 
CBOX_IRD is asserted during phase 2. The Cbox is ready to drive the I_CF command and B%S6_ 
DATA during phase 4. The assertion of M%ABORT_CBOX_IRD prevents both of those actions. 

The next IREAD may appear two cycles after the abort. 
Figure 13-18: M % A B O RT__C B O X_l R D Timing 



I cycle 1 I cycle 2 1 I 

| ++■*■++ | +f +++ | ++++•*■ | ++++4 | +++++ | + | + ++++ | +++++ | +++++ | +++++ | +++++ | +++++ | 

I I I I 

III I 

I I I Mbox may send next IREAD 

I | B%S6_DATA for I_CF not driven due to abort 

I C%CBOX_CMD-I_CF not driven due to abort 

M%ABORT CBOX IRD 
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If M%ABORT_CBOX_IRD is received after the system backmaps have been instructed to map the 
reference either by pMapWE for cache hits or by a READJBLOCK for a miss, the Pcache index 
to which the IREAD was to be done must be invalidated to avoid the Pcache from maintaining 
a block which is not backmapped. If IABORT is taken after the ARB sequencer has advanced 
to HDN 1 (read second octaword), 'SYS.READ' (read block), or TILL' (wait for data to be loaded 
to Pcache), an invalidate of the location to which the block was to be allocated is driven to the 
CM_OUT_LATCH. 

13.7 Arbiter/Bus Control 

The Arbitration/Bus Control Sequencer selects the highest priority command from the DREAD_ 
LATCH, IREAD.LATCH, or Write Queue. 

The following sequences are executed; 



1. 


DREAD 


2. 


READ LOCK 


3. 


IPR READ 


4. 


IREAD 


5. 


WRITE 


6. 


WRITE BYTE/WORD 


7. 


WRITE UNLOCK 


8. 


IPR.WR 



13.7.1 Dispatch Controller 

The ARB/Bus Control Sequencer controls two satellite machines, the DISPATCH and FILL con- 
trollers. The DISPATCH controller selects the next command, controls the WRITE. QUEUE 
pointers, and drives the required address to the pads. When the Arb Machine is ready to pro- 
cess a new read or write request the DISPATCH controller is enabled. In the first cpu cycle of 
dispatching a read or write command, the DISPATCH controller determines which command is 
highest priority and asserts the command code to the ARB Sequencer. The Dispatch commands 
are, 

1. DREAD: DREAD_LATCH valid with DREAD CMD not io_space address and no Dread/Write 
Conflict bits are set 

2. DREADJO: DREAD.LATCH valid with DREAD CMD io_space address and no Dread/Write 
Conflict bits are set 

3. DREAD.LOCK DREAD_LATCH valid with READ.LOCK CMD and no Dread/Write Conflict 
bits are set 

4. IPR„READ: DREAD JLATCH valid with IPR.READ CMD and no Dread/Write Conflict bits 
are set 

5. IREAD: the DREAD_LATCH is empty or Dread/Write Conflict bits are set in the Write Queue 
and IREAD_LATCH valid not io.space address and no Iread/Write Conflict bits are set 

6. IREADJO: the DREAD_LATCH is empty or Dread/Write Conflict bits are set in the Write 
Queue and IREAD JLATCH valid, io_space address and no IreadAVrite Conflict bits are set 
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7. WRITE_UNLOCK: the DREAD_.LA.TCH is not valid or DreadAVrite Confict, and the IREAD_ 
LATCH is not vaHd or Iread/Write Confict, and the Write Queue CMD = Write_Unlock and 
not io_space address 

8. WRITE: the DREAD_LATCH is not vaHd or DreadAVrite Confict, and the IREADJLATCH is 
not vaHd or Iread/Write Confict, and the Write Queue CMD = Write and not io„space address 

9. IOJWRITE: the DREAD_LATCH is not vaHd or DreadAVrite Confict, and the IREAD.LATCH 
is not vaHd or IreadAVrite Confict, and the Write Queue CMD = Write and io„space address 

10. WRITE_UNLOCK_IO: the DREAD.LATCH is not vaHd or DreadAVrite Confict, and the 
IREADJLATCH is not valid or IreadAVrite Confict, and the Write Queue CMD = Write. 
Unlock and io_space address 

11. IPR_ WRITE: the DREAD..LATCH is not vaHd or DreadAVrite Confict, and the IREAD_ 
LATCH is not vaHd or IreadAVrite Confict, and the Write Queue CMD = IPR_WRITE 

12. NOP:the DREAD_LATCH is not vaHd or DreadAVrite Confict, and the IREAD„LATCH is not 
vaHd or IreadAVrite Confict, and the Write Queue is empty 

NOTE: READJLOCK to I/O space is not implemented. 

By the phase 1 of the second cpu cycle of a dispatch request the selected address from either the 
DREAD latch, IREAD latch, or WRITE QUEUE- is driven onto the internal address bus to the 
pads. By the next phase 3 the sselected address starts to be driven externally. The ARB controller 
changes state once per cache_speed (i.e. 2,3, or 4)cpu cycles, with the ARB 'AND' array enabled 
at phase 3, and the ARB 'OR' array selecting during phase 4. 

Rgure 1 3-1 9: DISPAtCH timing 



dipatch timing for ceche_spe<ad 
dispatch cycl€ 1 



cpu cycles 



cache cycle 1 



I 

I I 
cpu cycle 1 I cpu cycle 2 I cpu cycle 3 I cpu cycle 4 I 

++-I-+4 | +++++ | *•++++ | +*+++ | | +++++ | +++++ | +++++ | +++++ | +++++ 1 +++++ | +++++ | +++++ 1 +++++ 1 ++■!-*+ | +*+++ | 

i AND OR | | AND OR I 



ADDRESS DRIVES 
ARB PLA 



LATCH TAG 
LATCH DATA 



ARB PLA 



ADDRESS TO PADS 
CMD TO ARB 



The DREAD latch or 1KEAU iatch can receive a new request as late as phase 2 of cpu cycle 1 of the 
dispatch. The Dispatch command and address source are determined in phase 3 and the address is 
driven to the pads in phase 4 of cpu cycle 1 allowing 3 phases to drive the address to the pad drivers. 
The D and I conflict bits for a newly received READ request are not determined until phase 1 of cpu 
cycle 2. The I and D conflict bits Eire sent with the dispatch command to the ARB Controller. If the 
dispatch command is DREAD, DREAD_IO, DREAD„LOCK, or IPR.READ and a D conflict exists, 
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or the dispatch command is IRE AD, or IREAD^IO and an I conflict exists the dispatch_in signal is 
cleared and the ARB state remains IDLE' for the next SYS_CLK cycle. 

13.7.2 Fill Controller 

The FILL controller checks ECC or parity, corrects single bit ECC errors, sets BIUJ3TAT on 
errors, moves input data to the CM_OUT_LATCH, merges write data and generates check bits 
when enabled by the ARB sequencer. The FILL controller is started by FILL_CMDs from the 
ARB sequence. 

1. FILL_IDLE - wait for command 

2. FILL_RD_1 - fill first octaword of cache read 

3. FILL_RD_2 - fill second octaword of cache read 

4. FILL.SYS - fill block from READ_BLOCK or LDxL, or QW if 1 0_ SPACE 

5. FILL_BWM_SYS - merge write data with LDxL data from system, generate ECC 

6. FILLJSG - generate ECC on write data 

7. FILL_BWM_DIR - merge write data with cache read data, generate ECC 
The fill rate is limited to one quad word every two cpu cycles. 

13.7.3 ARB PLA INPUTS 

The following signals are inputs to the ARB PLA "AND ARRAY" and are used in determining the 
next output and state transition of the ARB Sequencer. 

dsp_cmdO:0> - Dispatch ' Commands 

art_state<4: 0> - ARE STATE 

caek<2:0> - IDLE, HARD_ERROR, SOFT_ERROR, STxC_FAIL, OK 

dispatch_in - dispatch command present 

beache_en - EID__CTL<0> - '1 

not bcBche_en or "PV - BIlTcTL<0> -' 0 or BIU_CTL<PV> - ' l 

hold_in - hold_rec and dispatch and not (WRITE, WRITE_UNLOCK, or WRITE_IO) 

holc_rec - holdRec_h pin is asserted 

err_ir. - error detection enabled <err_flag) and an error is detected 

staii_req - tagOK_I and holdReq__h are checked at phase 4 

(synchronized from last phase 3 of cache probe cycle) 

stall_wr - not tagOK_l or hold request at phase 4 of last cpu cycle of ARE state 

irc_abort - I ABORT ~ 

same_octaword - from WRITE_QUEUE , pack QW unless OUT_BUF not empty 

byte~word_write - WRIT£_QUEU£ BM<7:4 or 3:0> not ' llll~or '0000 

bwr_chain - byte/word write in progress 

fill_done - Fill Sequencer operation completed 

read_hit - match, valid, correct tag and Ctrl parity 

write_hit - match, valid, not shared, correct tag and Ctrl parity 

13.7.4 ARB PLA OUTPUTS 

The ARB PLA outputs next state, enable, and data path control signals. 
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ark_state 




next ARB STATE 




di spat ch_f lag 


— 


enable dispatch_in next access 




hold_en 




enable hold 




err_f lag 




enable error logic/input 




tagok_stall 




blocfc fill done latch 




iread_chain_set 




set iread in progress 




pcread_chain_set 




set Pcache read in progress 




io__chain_set 




set 10 in progress 




bwr_chain_set 




set bw:: in progress 




all_chains_clr 




clear all in progress state 




FILL_CMD<2 : 0> 




IDLE, RD_1, PX_2, SYS, BWH_SYS, EG, BWM_DIR 




da t. s_wr i te_r eg_l d_en 




load OUT^BUF with QW being~packed ~ 




ipr_rd_en 




return ipr read data 




ipr_wr_en 




WRIT£_QUEUE data to ipr 




rl__retire_en 




clear .T or D read latch valid flag 




pMsipWE en 




enable map write strobe 




lw__mask_calc_en 




set LWMask<7:0> from addresa<4 ;3> and WRITE QUEUE byte mas* bits 


new_addr<_ld 




toggle dataA<4> at phase 3 of last epu cycle of 


next ARB eycle 


ce__en 




assert dataCEOEO : 0> and tagCEOE 




tce_en 




assert tagCEOE 




tag_probe_req 




enable tag compare 




tce_dis 




deassert tagCEOE at end of next SYS^CLK cycle 




dataceoe dis 




deasse.rt d*ata chip enables dataCEOEO: 0> at end 


of next SYS_CLK 


in__data_lat_en 




latch cache input at end of next SYS_CLK cycle 




wr__arm_en 




causes the dataWE_h<3:0> signals to be "armed" 




sys_dp_ctrl_er. 




data pjsth control to fill sequencer 




creq_lat_er. 




latch new CREC 




CREQ<2 :0> 




IDLE, R£AD_BL0CK, WRITE BLOCK, LDxL, STxC 





13.7.5 IDLE 

IDLE' is the next state upon the completion of all ARB sequences. Dispatch_fiag is not asserted 
when entering IDLE', therefore a one SYS„CLK nop cycle exists between ARB requests. The 
IDLE' term enables dispatch_flag allowing the next request to processed. **When the Serial 
Rom is being read by microcode, the SROM is output enabled (SOE-IE[SROM_OE] = 1), the 
dispatch_in signal is seen as reasserted by the ARB PLA if the dispatch command is WRITE. 
This allows microcode to write data to Pcache, with the corresponding write through data going 
to the Write_Queue. The external WRITE request from the queue is "dropped" while the SROM 
data is transferred to Pcache.** 

13.7.6 DISPATCH 

This section describes the dispatch fork; the outputs enabled in response to the dispatch selection, 
and the next ARB state selection. 

1. NOP and not holdjn: 'IDLE' 

dispatch_f lag - retry dispatch 
hold_en - enable hold 

2. DREAD and B cache enabled and not hold_in: DRD', start fast external cache read sequence 

pcread_chair._set - set Pcache read in , progress 

FILL_RD_1 ~ - fill of first octaward begins at end of next SYSJCLK cycle 

tce_dis - deassert tag chip enable at end of next SYS_CLK cycle 

tag_probe_req - start tag compare 'at end of next SYS_CLK cycle 

in_data_lat_en - latch cache input at end of next SYS_CLK cycle 
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3. DREAD and Bcache not enabled and not hold_in: 'SYS_RD', no Bcache direct to system read 

err_flag - enable err_in (cack - hard error) 

pcread_ehain_set - set Pcache read in progress 

FXLL_SYS ~ - fill block when CACK » OK or SOFT 

sys_dp_ctrl_en - data path control to fill sequencer 

creq_lat_en - latch new CREQ 

CREQ~ ~ - READ_BLOCK 

4. DREAD_IO and not hold_in: 'SYS_RD', I/O Space direct to system read 

err_flag - enable err_ln (cack - hard error) 

FILL_SY£ - fill target QV (not pcread__chain_set) when CACK - OK or SOFT 

sys_dp_etrl_en - data path control to fill sequencer 

crec_lat_en - latch new CREQ 

CREQ~ ~ - READ_BLOCK 

io_chain_set - set 10 in progress 

5. DREAD.LOCK and not hold_m: 'SYS_RD\ readjock, MUST LOCK OUT IRE ADS TILL 
STxC pass or IPRJWR 

err_£lac - enable err_ir. (cack - hard error) 

pcreac chain set - set Pcache read in progress 

FIL1_SYS ~ - fill block when CACK - OK or SOFT 

sys_dp_ctrl_en - data path control to fill sequencer 

crec_lat_en - latch new CREQ 

CREcT - LDxX 

6. IREAD and Bcache enabled and not IABORT and not hold_in: 1RD', start fast external cache 
read sequence 

pcread_ehain_set - set read in progress 

iread_chain_set - set iread in progress 

FXIX_RD_1 - fill of first octaward begins at end of next SYS_CLK cycle 

tce_dis - deassert tag chip enable at end of next SYS_CLK cycle 

tag_probe_rec - start tag compare at end of next SYS_CLK cycle 

in_data_lat_en - latch cache input at end of next SYS_CLK cycle 

7. IREAD and not Bcache enabled and not IABORT and not hold_in: 'SYS_RD', no Bcache direct 
to system read, set iread 

err_flag - enable err_in (cack » hard error) 

pcread_chain_set - set Pcache read in progress 

iread chain_set - set iread in progress 

FILI_Iys - fill block when CACK - OK or SOFT 

sys_dp_ctrl_en - date path control to fill sequencer 

crec_lat_en - latch new CREQ 

CREQ - - READJBLOCK 

8. IREAD JO and not IABORT and not hold_in: 'SYS_RD\ I/O Space direct to system read, set 
iread and 10 in progress 

err_flag - enable err_in (cack » hard error) 

iread chain_set - set iread in progress 

FIL^_Iys - fill block when CACK - OK or SOFT 

sys_dp_ctrl_en - data path control to fill sequencer 

creo_lat_en - latch new CREQ 

CR£Q~ ~ - R£AD_BIiOCK 

io_chain_set -- set 10 in progress 

9. IREAD or IREAD JO and IABORT and not hold_in: TDLE', IABORT before iread starts 

dispatch_f lag - retry dispatch 
hold en - enable hold 



10. IPRJREAD and not hold Jin: TDLE' , ipr_rd_en, rl_retire_en 

11. IPR .WRITE and not holdjn: TDLE' , ipr_wr_en 
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12. WRITE and byte_word and not "PV and Bcache enabled and not hold request: 'BWR_ 
PROBE', start cache read for RMW 

■• set bwr in progress 

- set LWMask<7:0> from address<4:3> and WRITE^QUEUE byte mask bits 

- merge target QW from cache at end of next SYS_CLK cycle 

- deassert tag chip enable at end of next EY£_CLK cycle 

- deassert data chip enables at end of next SYS_CLK 

- start tag compare at end of next SYE_CLK cycle 

- latch cache input at end: of next SY£_CLK cycle • 

IS. WRITE and byte_word and not "PV" and Bcache enabled and hold request: 'BWR_STALL\ 
wait for holdreq to deassert 

bwr_chain_set ~ set bwr in progress 

hold_en - enable hold 

ce_en - assert dataCEOEO : 0> 

14. WRITE and byte.word and not "PV" and not Bcache enabled: r BWR_SYS_RD', byte.word 
write, no cache, not "PV 1 

enable err_in (eack » hard error) 
merge target QW when CACK - OK or SOFT 
data path control to fill sequencer 
latch new CREQ 
LDxL 

set bwr in progress 

set LWMask<7:0> from address<4 :3> and WRITE_QUEUE byte mask bits 

15. WRITE and not byte.word and same_octaword: IDLE', enable PACK_WRITE to OUTJBUE 

hold_en - enable hold 

FILL_EG - generate ECC on write data 
da'ta_write_reg_en - load OUT_BUF with QW being packed 

iw_mask_calc_en - set I.WMask<7:0> from address<4:3> and WR.XTE_QUEUE byte mask bits 

16. WRITE and not byte_word and not "PV' and not same_octaword and Bcache enabled and not 
hold request: *WRJPROBE\ start fast external tag read 

iw_mask_calc_en - set LWMask<7:0> frpm address<4:3> and WRITE__QUEU£ byte mask bits 
FXLL_EG - generate ECC on write date 
data_write_reg_en - load OUT_BtJF with QW being packed 

tce_dis - deassert tag chip enable at end of next SYS_CLK cycle 

tag_probe_req - start tag compare at end of next £YS_CLK cycle 

17. WRITE and not byte_word and not "PV' and not same_octaword and Bcache enabled and 
hold request: *WR_STALL\ wait for holdreq to deassert 

lw_mask_calc_en - set LWMask<7:0> from address<4 :3> and WRITE_QUEUE byte mask bits 

FILL_EG - generate ECC on write data 

dats_write_reg_en - load OUT_BUF with QW being packed 

holc_en - enable hold 

ce_eh - assert dataCEO£<3:0> 

18. WRITE and (not byte.word or "PV') and not same_octaword and (not Bcache enabled or "PV'): 
'SYS_WR\ no cache or "PV, start system write 

err_flag - enable err_in (cack ■ hard error) 

sys_dp_ctrl_en - data path control to fill sequencer 

lw_mask_calc_en - set LWMask<7:0> from address<4:3> and WRITE_QUEUE byte mask bits 
FXLL_EG - generate ECC on write data 
data_write_reg_en - load OUT_BUF with QW being packed 
creq_lat_en - latch new CREQ 

CREQ - SYS_WR 



bwr_chain_set 
1 w_ma sk_calc_en 
FILL_BWM_DIP~ 
tce_dis 
dataceoe_dis 
t ag_probe_req 
in data lat en 



err_f lag 
FIlI_BWM_SYS 
s y s_dp_ ct r I _e n 
creq_lat_en 
CREC 

bwr_chain_set 
lv mask calc en 
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19. 1 0_ WRITE: 'SYSJWR', 10 space write direct to WRITE_BLOCK 

err_flag - enable err_in (cack » hard error) 

sys_dp_ctrl_en - data path control to fill sequencer 

lw_mask_calc_en - set LWMask<7 : 0> from address<4:3> and WRITE_QU£UE byte mask bits 

FILL_EG - generate ECC on write data 

data_write_reg_en - load 0UT_BUF with QK being packed 

creq_lat_«n - latch new CREQ 

CREQ~ - Si£_MP. 

io_chain_set - set 10 in progress 

20. WRITE_UNLOCK: 'BWR.SYSMERGE', assume all write.unlocks to byte_word type, get data 
from IN.BUF 

lw_mask_calc_en - set LWMask<7:0> from address<4:3> and WRITE_QUEUE byte mask bits 
FILL_BWM_DIR - merge target OW from cache at end of next SY£_CLK cycle 

21. WRITE.UNLOCKJO: 'SYSJWR', 10 space write direct to STxC 

err_flag - enable err_ir. (cack » hard error) 

sys_dp_ctri_en - data path control to fill sequencer 

lw_mask_calc_en - set LWMask<*:0> from address<4:3> and WRIT£_ QUEUE byte mask bits 

FILL_EG - generate ECC on write data 

data_write_reg_en - load OUT_BUF with 0W being packed 

creq__iat_en - latch new CREC 

CREG. - STxC 

io_chain_set - set 10 in progress 

22. hold.in: hold request and hold_en and not dispatch of (WRITE or WRITEJO or WRITE_ 
UNLOCK): 'STALL', keep hold.en 



13.7.6.1 PACK_WR!TE 

The Write_packer asserts the same_octaword bit in a Write_queue entry when a new write request 
is to the alternate QW of the octaword which is presently in the Write_Packer, and the Write„ 
Packer byte mask bits indicate only full Longwords. 

When a write command is received by the ARB Controller from the Write_queue with same,, 
octaword, it is known the next entry will be to the same octaword, so entry of 1 or 2 LWs is 
moved to the OUT_BUF, and the write bus cycle is deffered till the next Write command. **If the 
same_octaword bit is set in Write_ Queue and the OUT_BUF is not empty, the write address is 
returning to the quadword already packed in the OUT_BUF. Since this write may not be to same 
LW as the previous one, packing at this point can not proceed. The ARB pla for same_octaword 
is deasserted and the write bus cycle proceeds.** 

The quadword of data with ECC check bits (or parity) is moved to OUT_BUF<63 :0> if Address<3> 
= '0, and to OUT_BUF<127:64> if Address<3> = '1. The LWJMASK register is set from the byte 
mask bits BM<7:0> as 

if address<4:3> = '00 LW_MASK<0> = 1 if BM<3:0> is not '0000 
if address<4:3> = '00 LW_MASK<1> = '1 if BM<7:4> is not '0000 
if address<4:3> = '01 LW_MASK<2> « 1 if BM<3:0> is not '0000 
if address<4:3> - '01 LW_MASK<3> « *1 if BM<7:4> is not '0000 
if address<4:3> = '10 LW_MASK<4> = '1 if BM<3:0> is not '0000 
if address<4:3> ■« 10 LW_MASK<5> = '1 if BM<7:4> is not '0000 
if address<4:3> « '11 LW_MASK<6> = 1 if BM<3:0> is not '0000 
if address<4:3> « '11 LW_MASK<7> = 1 if BM<7:4> is not '0000 
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When same_octaword indicates the present WRITE^ QUEUE QW is to be packed at the OUT_ 
BUF, the valid longwords are set as 

• X0 = '1 if BM<3:0> is not '0000 

• XI = 1 if BM<7:4> is not '0000 

and are used to indicate the byte masks for the packed QW in "PV" writes. 

13.7.6.2 IPR_READ 

The Arb Control State machine executes an IPR_RD if an IPR_RD is in the DREAD.LATCH and 
no Dread/Write Conflict bits are set (i.e. the Write Queue has emptied). 

The IPR address is decoded and the data is driven to the CM_OUT_LATCH and the DREAD_ 
LATCH clears. The next state is IDLE', dispatch is not enable. 

13.7.6.3 H IG H_LW_TEMP 

When a quadword aligned read of I/O space is performed the high LW of data is latched in this 
register. When a non quadword aligned read to I/O space is dispatched and BIU_CTL<QWJL / 0_ 
RD> = '1 then the data from HIGH_LW_TEMP is returned as if an IPR.READ. The bus cycle is 
not done. 

13.7.6.4 DREAD_LOCK 

The Arb Control State Machine sequences directly to the 'SYS_RD' state if a DREAD_LOCK is in 
the DREAD_LATCH and no Dread/Write Conflict bits are set (i.e. the Write Queue has emptied), 
and tagOK_l and holdReq_h are deasserted. 

DREAD JLOCK is issued by microcode for interlock instructions. No further I stream references 
are tried until the data read via the DREADJLOCK is modified and successfully writen back to 
memory using a STxC bus cycle that is CommandACKnowledged OK After modifying the read_ 
lock data microcode issues a write_unlock which results in a STxC. Microcode then reads the 
STxC_IPR to see if the data was written successfully. If the STxC indicates fail, the interlock 
could not be completed, and microcode retries the sequence from the DREADJLOCK. 

If a DREAD_LOck results in a hard error, the error handler executes an IPR_WR to CEFSTS to 
restart I stream processing. 

**The DREADJLOCK dispatch sets a flop inhibiting IREADS until STxC is executed successfully 
or an IPRJWR (CEFSTS @ AC(hex)) is received at the CBOX .** 

13.7.6.5 WRITE 

A non byte write is the highest priority bus request when, 

the DREAD_LATCH is not valid or Dread/Write Confiet 

the IREAD~LATCK is not valid or Iread/Write Confict 

the Write Queue CMD - Write 

BM<7:4> - '1111 or ' 0000 or TV 

BM<2:0> - '1111 or ' 0000 or "PV 
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The WRITE_QUEUE address is moved to the pads and the data is latched ECC/parity generate 
section, and the WRITE_QUEUE head is advanced for a dispatch with CMD = Write. The possible 
ARB breakouts are, 

• TACK_ WRITE' if SAME.OCTAWORD and the OUT.BUF is empty (LW_MASK<7:0> = 
'00000000) 

• r WRITE_WAir if not SAME_OCTAWORD or the OUT.BUF is not empty and hold.req 

• , WRITE_PROBE' if not SAME.OCTAWORD or the OUTJBUF is not empty and not hold.req 
and (bcache_en and not "PV") 

• 'SYS_ WRITE' if not SAME.O CTAWORD or the OUT_BUF is not empty and not hold.req and 
(bcache_en or "PV") 

The Write Queue data with ECC check bits is moved to OUT_BUF<63:0> if Address<3> = '0, and 
to OUT_BUF<127:64> if Address <3> = 1, and the appropriate LW_MASK bits are set as in the 
PACK_WRITE dispatch. 

13.7.6.6 BWR 

If a byte write is the highest priority bus request, 

the DREAD_LATCH is not valid or Dread/Write Confict 

the IR£AD~LATCH is not valid or Iread/Write Confict 
the Write Queue CMD - Write 
not "pv mode 

either BM<7:4> is not ('1111 or '0000) 

Or BM<3:0> is not ('1111 or '0000) 

the 'BWR.PROBE' state is entered if not stalLrequest else 'BWRJBTALL'. 
Byte and word writes for "PV" mode go directly to 'SYS_WRITE\ 



1 3.7.6.7 WRITEJJNLOCK 

If a Write_Unlock is the highest priority bus request, 

the DREAD_LATCH is not valid or Dread/Write Confict 
the IR£AD_LATCK is not valid or Iread/Write Confict 
the Write Queue CMD - Write_Uniock 

the 'SYS_WR' state is entered. cReq_h<2:0> is driven with STxC, and cWMask<7:0> is driven 
from LW_MASK<7:0> if "PV", else from BM>7:0>. The ARB state remains 'SYSJWR' until cAck 
is not idle. 

if cAck is IDLE, ARB state remains ' EY£__WR ' 
if cAck is HARD_ERROR the error is logged, 

c%cbox_h_err is asserted, microcode is signalled STxC PASS so as not to retry 
if cAck is SOFT_£RROR the error is logged, 

c%cbox_s err is asserted, proceed as OK 
if cAck is STxC_FAIL, the STxC IPR bit 2 is set to '1. 
if cAck is OK, the STxC XPR bit 2 is set to '0. 
if cAck is OK or STxC_FA.IL, the next state ' IDLE' 

An IPR read of the STxC register follows the Write.Unlock. Microcode repeats the interlock loop 
(i.e. read_lock/write_unlock) if the STxC register indicates fail. **An IPR_RD of STxC with bit 
2 = '0, renables CBOX IREAD processing and renables the MB OX IREF latch.** If the READ. 
LOCK reults an a hard error microtrap, microcode executes an IPRJWR (CEFSTS @ AC(hex)) to 
renable the CBOX IREAD processing and the MB OX IREF latch.** 
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13.7.7 DRD 

The DREAD address began driving at phase 3 of the second cpu cycle of the Dispatch Cycle. The 
T)READ' state is 2,3, or 4 cpu cycles in duration as programmed from cache_speed. At the phase 
4 of the last cpu cycle of T)RD' 

• tagAdr_h<31:17>, tagAdrPJi, tagCtlVJi, tagCtlDJi, tagCtlSJi, and tagCtlP„h are latched 

• data_h<127:0> and check. ,h<27:0> are latched in the INPUT„BUF<dataA_h<4». 

• the enable for tagCEOE is deasserted, tagceoe is deasserted at pins at next phase 2 

The next state is 'RDC. 

err_flag - enable err_in (tag or ctl parity) 

new_addr4_ld - toggle dataA<4> at phase 3 of last epu cycle of next ARB cycle 
pmapwe_en - assert pmapwe if cache data fills Peaehe 

13.7.8 IRD 

The IREAD address Began driving at phase 3 of the second cpu cycle of the Dispatch Cycle. The 
IREAD' state is 2,3, or 4 cpu cycles in duration as programmed from cache_speed. At the phase 
4 of the last cpu cycle of IRD' 

• tagAdr_h<31:17>, tagAdrP_h, tagCtlVJi, tagCtlDJi, tagCtlSJi, and tagCtlPJi are latched 

• data_h<127:0> and check_h<27:0> are latched in the INPUT_BUF<dataA_h<4». 

• the enable for tagCEOE is deasserted, tagceoe is deasserted at pins at next phase 2 

1. If IABORT, the next state is IDLE'. 

dispatch_f leg - enable dispatch_in next access 

hoid_en - enable hold 

dataceoe_dis - deassert data chip enables at end of next SYS_CLK 

all_chains_clr - clear all in progress state 

If ABORT„CBOX_IRD is asserted the loading of the CM_OUTJLATCH is inhibited so that 
data is not returned to the MBOX. AB ORT„CB OXJRD inhibits errors from the IREAD. 

IABORT is inhibited when pcread_chain and not iread_chain. 

2. If not IABORT, the next state is 'RDC, pla outputs same as DRD'. 

err_flag - enable err_in (tag or ctl parity) 

new_addr4_ld - toggle dataA<4> at phase 3 of last cpu cycle of next ARB cycle 
pmapwe_en - assert pmapwe if cache data fills Pcache 

13.7.9 RDC 

In the first cpu cycle of *RDC 

• The target quadword is moved from the data pads to the ECC, ECC check begins at phase 3 

• The target quadword is loaded into CM_OUT__LATCH at phase 4 and C_PIPE_%REQJDQW 
is set to tag the selected quadword of data. 

• Address<31:21/17> is compared to tagAdr_h <31:21/17> as specified by cache_size, tagCtlVJi 
is checked, and tag and control parity are checked. 
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• tagOK_l and holdReq_h are checked at phase 4 (synchronized from last phase 3 of cache probe 
cycle) 

read_hit is determined as 

tagAdr<31:22/I7> matches adr_h<31:22/17> 
tagCtlv_h is true 

tagCtlP_h and tagAdrP_h are correct 
or force hit 

stall request is not tagOK_L or hold request at phase 4 of first cpu cycle of ARB state. 
In the second cpu cycle of 'RDC 

• At phase 1 both read hit, and no ECC error are valid 

• At phase 2 if not read hit, or ECC error, or stall request, then C%CBOX_ECC_ERR is asserted 
causing the MBOX to ignore the data in CM_OUT_LATCH 

• At phase 2 if read hit and not stall request the proper pMapWE signal is enable (asserts at 
phi 3 at pins) to support system backmaps of Pcache 

In the last cpu cycle of HDC 

• At phase 3 dataA_h<4> toggles to begin access of second octaword 

• At phase 3 the ARB sequencer determines the next state 

If cache_speed is 3 or 4 cpu cycles the FILL machine loads the second quadword of the block 
during cpu cycle 3 of the 'RDC state if ECC was good for the target QW. 

1. If not IABORT and stall request, the nest state is 'STALL', wait for stall request to end 
(returning the cache resource to the NVAX Plus chip) 

tagok_stall - block fill done latch 

hold_«n - enable hold 

all_chains_clr - clear all in progress state 
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2. If not IABORT and not stall request and read_hit, the next state is r RDN\ 

- fill QWs 3 and A 
■• enable error logic/input 

- deassert data chip enables at end of next £YE_CLK 

- latch cache input at end of next SYS_CLK cycle 

- clear I or D read latch valid flag 

3. If not IABORT and not stall request and not readjhit, the next state is 'SYS„RD'. 

fill block when dRack 
latch new CREQ 
READ_LOCK 

enable error logic/input 

deassert data chip enables at end of next £YE_CLK 
date path control to fill sequencer 

4. If IABORT; the next state is IDLE', the IREADlLATCH valid bit is cleared, need to remove 
index in Pcache which system backmap replaced!! 

5. If tagOkJ and either tagCtlPJh and tagAdrP„h are not correct, the fill is stopped, the error 
is logged, c%cbox_s_err is asserted, and the ARB state returns to IDLE'. 

13.7.10 RDN 

The address for the second octaword began driving the previous phase 3. For cache_speed = 2 
timing the second quadword is moved to the CM_OUT„LATCH during this state. At phase 2 
of the first cpu cycle of 'RDN 5 the enable for selected pMapWE is deasserted (pMapWEJh<l:0> 
deasserts at phase 3 in the pins). At phase 4 of the last cpu cycle of 'RDN 1 the second quadword 
is latched, at the data pads, and the fill sequencer is notified that the second octaword is present. 

1. If not IABORT, the next state is 'FILL', enable err_flag. 

2. If IABORT, the next state is IDLE'. The IREAD„LATCH valid bit is cleared, need to remove 
index in Pcache which system backmap replaced!! 

13.7.11 FILL 

The ARB machine stays in FILL until the fill_done signal is received from the FILL sequencer 
indicating the read is complete, or an error or IABORT is detected. 

1. If not filLdone and not error and not IABORT, remain at TILL'. 

err_flag - enable error logic/input 

hold_en - enable hold 

2. If filLdone and not error and not IABORT, return to TDLE'. 

dispatch_flag - enable dispatch_in next access 

hold_en - enable hold 

all_chains_clr - clear all in progress state 

The fill is complete, C_PIPE_%LASTJ?TLL is set by the FILL sequencer to tag the last 
quadword of data. 

If address<31:29> is '111 "Return.I/OJData" is driven to the FILL sequencer. The INPUT_ 
BUF quadword addressed by address<4:3> is driven to the ECC check latch. C_PIPE_%REQ_ 
DQW and C_PIPE_%LAST_FILL are set to indicate selected and only return data. 

3. If IABORT and not error, the next state is IDLE', the IREAD.LATCH valid bit is cleared. If 
'FILL' from SYS_READ need to remove index in Pcache which system backmap replaced!! 



FIL1_RP_2 
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dataceoe_dis 
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CREQ - 
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4. If error, the next state is IDLE', and the error is logged. 

13.7.12 SYS_RD 

The 'SYS.RD' state is entered from 

1. DISPATCH for DREAD no B cache, DREAD JO, IREAD no Bcache, or IRE AD JO, cReq_ 
h<2:0> is RE AD JB LOCK 

2. DISPATCH FOR DREAD.LOCK cReq_h<2:0> is LDxL. 

3. HDC for DREAD miss, cReqJi<2:0> is READ_BLOCK 

The cWMask Hnes are as 

• cWMask[l:0} are address[4:3} 

• cWMask[2] is '1 if not I/O space, Pcache allocate(EV D-stream) 

• cWMask[3] indicates Pcache set being allocated, for systems which support a backmap for 
each set 

• cWMask[4] indicates I -stream 

The cReqJi Hnes become valid with the first sysClkOutl Ji rising edge after the first cpu cycle of 
'SYS.RD'. The 'SYS_RD' state repeats until cAck_h<2:0> returns error or OK. 

1. If CACKJDLE, remain at 'SYS.RD'. 

err_flag - enable error logic/input 

sys_dp_ctrl_en - data path control to fill sequencer 
holc_en - enable hold 

2. If CACK.OK and not IABORT, the next state is TILL'. 

err_f lag - enable error logic /input 

rl_retire_en - clear I or D read latch valid flag 

3. If not CACKJDLE and IABORT, the next state is TDLE', need to remove index in Pcache 
which system backmap replaced!! 

4. If error, the next state is IDLE', and the error is logged. 
13.7.12.1 Read Errors 

• bad tagCtlPJi -> c%cbox_s_err; c%cbox_hard_err; (machine check) 

• bad tagAdrPJi -> c%cbox_s_err; c%cbox_hard_err; (machine check) 

• single bit ECC errors -> c%cbox_s_err 

• double bit ECC -> e%cbox„s_err; c%cbox_hard_err; (machine check) 

• cAckJi = SOFT_ERROR -> c%cbox_s_err 

• cAckJi = HARD_ERROR -> c%cbox_s_err; c%cbox_hard_err, (machine check) 
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13.7.13 WR_STALL 

When a non_byte_word WRITE with the Bcache enabled and not "PV" is dispatched the address, 
data and mask logic is set, and the entry is removed from the WRITE. QUEUE. 

write_stall is not tagOK_l or hold request at phase 4 of last cpu cycle of ARB state. 

If write_stall occurs before the non_byte_word write sequence (WR_PROBE/probe, WR_CMP/compare, 
WR/write) can be completed or during the DISPATCH of the non_byte_word WRITE, the ARB 
state machine loops in *WFLSTALL' till the write_stall deasserts 

tago'K_stall - block fill done latch 

hoid_en - enable hold 

ce_en - as salt dataCEOEO : 0> 

and then advances to T WR„PROBE\ 

tag_probe_req - start tag compare at end of next SY£_CLK cycle 
tce_dis - deassiert tag chip enable at end of next SY£_CLK cycle. 

restarting the non_byte_word write sequence with address, data, and mask already at the pins from 
the DISPATCH. 

13.7.14 WR_PROBE 

If 'WR.PROBE' is entered from DISPATCH, the address from the Write Queue began driving at 
phase 3 of the second cpu cycle of the Dispatch Cycle. 

The r WR_PROBE' state is 2,3, or 4 cpu cycles in duration as programmed from cache_speed. At 
the phase 4 of the last cpu cycle of *WR_PROBE' 

♦ tagAdr_h<31:17>, tagAdrP.h, tagCtlVJi, tagCtlDJi, tagCtlSJi, and tagCtlPJi are latched 

• the enable for tagCEOE is deasserted, tagceoe is deasserted at pins at next phase 2 

The next state is r WR_CMP', wr_arm_en causes the dataWE_h<3:0> signals are "readied" from 
LW_MASK<3:0> if address<4> * '0, and from LW_.MASK<7:4> if address<4> = >1. tagCtlWEJi 
is "armed". 

13.7.15 WFLCMP 

Write hit is determined, where writejhit equals 

tagAdr<31:22/17> matches adr_h<31:22/17> 
tagCtlVJi is true 
tagCtlS_h is false 

tagCtlP.h and tagAdrP„h are correct 
or force hit 

The next state is 

1. If writejhit and not write„stall and not tag„error, the next state is W. 
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2. If not write_hit and not write_stall and not tag_error, the next state is 'SYSJWR', and 
tagCtlWE and dataWE<3:0> are "disabled". 

- enable err_in (cack » hard error) 

- data path control tc fill sequencer 

- latch new CREQ 

- WRITE BLOCK 

3. If write_stall, the next state is T WR_STALL\ and tagCtlWE and dataWE<3:0> are "disabled". 

4. If not write_ stall and tag_error (either tagCtlP_h and tagAdrP_h are not correct), tagCtlWE 
and dataWE<3:0> are "disabled", the error is logged, c%cbox_s_err is asserted, and the ARB 
state returns to IDLE'. 



err_f lag 
sys_dp_ctrl_en 
crec_lat_en 
CR£Q~ 



Figure 13-21: wr_stall timing 
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13.7.16 WR 

data_h<127:0> and check<27:0> are driven onto the EDAL from the OUT_BUF. The tagCtl lines 
are driven as 

• tagCtlD Ji is DIRTY 

• tagCtlVJti is not changed 

• tagCtlS_h is not changed 

• tagCtlPJi is toggled if tagCtl.h was previously CLEAN 

If write_ stall sampled at the previous phase 4 is true tagCtlWE and dataWE<3:0> are "disabled", 
and the write sequence is retried after the write_stall is completed. 

If write_stall sampled at the previous phase 4 is not asserted, tagCtlWE and the selected 
dataWE<3:0> signals are driven from phase 2 of the first cpu cycle through phase 2 of the last 
cpu cycle of r WR', the LW_MASK register is cleared. 

1. If not write_stall, the write has completed successfully, the next state is IDLE'. 
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dispatch_f lag - enable dispatch_in next access 

hold_en - enable hold 

ali_chains_cir - clear all in progress state 

2. If write_stall, the write enable were blocked, the next state is *WR_STALL\ 

13.7.17 BWR__STALL 

When a byte_word WRITE with the B cache enabled and not "PV" is dispatched the address, data 
and mask logic is set, and the entry is removed from the WRITE_QUEUE. 

write_stall is not tagOKJ. or hold request at phase 4 of last cpu cycle of ARB state. 

If write_stall occurs before the bytejword write sequence (BWR_PROBE/probe,BWR_CMP/compare, 
BWR.MERGE/merge, WR/write) can be completed or during the DISPATCH of the byte_word 
WRITE, the ARB state machine loops in 'BWR_STALL' till the write_stall deasserts 

bloc): fill done latch 
enable hold 
assert dataCEOEO : 0> 

and then advances to 'BWR.PROBE', 

enable error logic /input 

merg«= target QK iron, cache at end of next SY£_CLK cycle 
latch cache input at end of next £Y£_CLK cycle 
start tag compare at end of next S5fS_CLK cycle 
deasiiert tag chip enable at end of next SYS_CLK cycle 

restarting the byte_word write sequence with address, and mask already at the pins from the 
DISPATCH, and the WRITE_QUEUE already at the Merge register. 

13.7.18 BWR_PROBE 

The READ_B YTE/W ORD address began driving at phase 3 of the second cpu cycle of the Dispatch 
Cycle. The 'READ_B YTE/W ORD' state is 2,3, or 4 cpu cycles in duration as programmed from 
cache.speed. At the phase 4 of the last cpu cycle of *READ_B YTE/W ORD' 

• tagAdr_h<31:17>, tagAdrP.h, tagCtJVJi, tagCtlD_h, tagCtlSJi, and tagCtlP_h are latched 

• data_h<127:0> and checkji<27:0> are latched in the XNPUT_BUF<dataA_h<4». 

• the enable for tagCEOE is deasserted, tagceoe is deasserted at pins at next phase 2 

The data from the WRITE_QUEUE is loaded into the MERGE register. The next state is 'BWR_ 
CMP'. 

13.7.19 BWR_CMP 

The quad word of data from the INPUT.BUF pointed to address <4:3> is driven to the "ECC/MERGE" 
logic. ECC is checked, single bit errors are corrected. 

• single bit ECC errors -> c%cbox_s„err 

• double bit ECC on target quad word aborts "byte/word write"; -> c%cbox_h_err 
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The data is merged and loaded at the output drivers as in ARB state 'BWR_MERGE\ Write hit 
is determined. The next state is 

1. If write_hit and not write_stall and not (tag_error or fLll_error), the next state is 'BWR_ 
MERGE'. wr_arm_en causes the dataWE_h<3:0> signals to be "armed" from LW_MASK<3:0> 
if address<4> = *0, and from LW_MASK<7:4> if address<4> = '1. wr_arm_en causes 
tagCtlWE_h to be "armed". If a single bit ECC error is corrected for the read data the 
error is logged and c%cbox_s_err is set. 

2. If not write_hit and not write_stall and not tag_error, the next state is 'BWR.SYSJEtD'. cReq_ 
h<2:0> is driven with LDxL. 

enable err_in (each - hard error) 

merge target QW from cache at end of next SYS_CLK cycle 
date path control to fill sequencer 
latch new CREQ 
LDxl 

3. If write. stall, the next state is 'BWR_STALL\ 

4. If not write_stall and tag_error (either tagCtlP_h and tagAdrP.h are not correct), the error 
is logged, c%cbox_s_err is asserted, and the ARB state returns to 'IDLE'. 

5. If not write_stall and fill_error (uncorrectable ECC), the error is logged, c%cbox_h„err is 
asserted, and the ARB state returns to TDLE'. 

13.7.20 BWR_MERGE 

The data is merged and loaded at the output drivers. 

if BM<0>- '1 6ats<cn:00> - Write_Queue<0"7 : 00>; if BM<0>« '0 data<07 : 00> ■ - MERGE_register<07 : 00> 

if BM<1>- '1 data<15:06> - Write~Queue<15: 08>; if BM<0>- '0 data<15:08> - MSRGE~register<15 : 08> 

if BM<2>- '1 data<23:16> - Write~Queue<23 : 16>; if BM<0>- '0 data<23:16> - MERGE~register<23 : 1 6> 

if BM<3>- '1 data<31:24> - Write~Queue<31:24>; if BM<0>- '0 data<31:24> - MERG£~register<31 :24> 

if BM<4>- '1 data<36:32> - Write~Queue<3&: 32>; if BM<0>- '0 data<3S*:32> - MERGE~register<3S :22> 

if BM<5>- '1 data<47:40> - Write_Queue<4 7 : 40>; if BM<0>- '0 data<47:40> - MERGE_register<47 : 4 0> 

if BM<6>- '1 data<55:48> - Write~Queue<55 : 48>; if BM<0>- '0 data<55:48> - MERGE~register<55 : 4 8> 

if BM<7>- 'l data<€3:56> - Write~Queue<63:56>; if BM<0>- '0 data<63:56> - MERGE~register<63 :56> 

ECC check bits are generated for data<63:0> which is loaded into the OUT_BUF. 

1. If filLdone and not write_stall, the next state is 'BWR_WR\ 

2. If not filLdone and not write.stall, the state remains 3WRJS1ERGE', dataWE_h<3:0> and 
tagCtlWEJi are "RE-armed". 

3. If write.stall, the next state is 'BWR_STALL\ 

13.7.21 BWR 

data_h<127:0> and check <27:0> are driven onto the EDAL from the OUT_BUF. The tagCtl lines 
are driven as 

• tagCtlD.h is DIRTY 

• tagCtlV_h is not changed 

• tagCtlS_h is not changed 

• tagCtlPJi is toggled if tagCtlJi was previously CLEAN 

If write_stall sampled at the previous -phase 4 is true tagCtlWE and dataWE<3:0> are "disabled", 
and the byte_word write sequence is retried after the write_stall is completed. 



err_f lag 
FIlT_BWM_DIR 
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If write_stall sampled at the previous phase 4 is not asserted, tagCtlWE and the selected 
dataWE<3:0> signals are driven from phase 2 of the first cpu cycle through phase 2 of the last 
cpu cycle of 'BWR', the LWJMASK register is cleared. 

1. If not write_stall, the write has completed successfully, the next state is IDLE'. 

2. If write_stall, the write enable were blocked, the next state is 'BWR_STALL'. 

13.7.22 BWR_SYS_RD 

The ARB state remains 5 BWR_SYS_RD' until the system completes the LDxL command. 

1. If CACK = idle, wait in 'BWR_SYS„RD\ 

err_flag --enable error logic/input 

sys_dp_ctrl_en - data path control to fill sequencer 
hoid_en - enable hold 

2. If CACK = OK or soft errr the next state is 'BWR_SYS_MERGE\ and errjlag is enabled for 
the ECC check. If soft error the error is logged, c%cbox_s„err is asserted. 

3. If CACK = hard error, the next state is 'IDLE', the error is logged in BIILSTAT and BIU„ 
ADDR, the c%cbox_h_err. is asserted and the "byte/word write" sequence is aborted. 

13.7.23 BWR_SYS_MERGE: 

The quadword of data from the INPUT.BUF pointed to address <4:3> is driven to the "ECC/MERGE" 
logic. ECC is checked, single bit errors are corrected. 

• single bit ECC errors -> c%cbox_s_err 

• double bit ECC on target quadword aborts "byte/word write"; -> c%cbox_h_err 

The data is merged and loaded at the output drivers as in ARB state T3WR_MERGE\ ECC check 
bits are generated for data<63:0> which is loaded into the OUT_BUF. 

1. If not filLdone and not h.ard_error, the state remains 'B WR_SYS_ME RGE ' , keep err_flag 
enabled for ECC check. 

2. If filLdone and not hard„error, the next state is 'SYS_WR'. If a single bit ECC error is 
corrected for the read data the error is logged and c%cbox_s_err is set. cReq_h<2:0> is 
driven with STxC, and cWMask<7:0> is driven from LW_MASK<7:0>. LW.MASK is set 
from BM<7:0> and address<3:0> as in the TACK_ WRITE' state. Bits of LW_MASK<7:0> 
previously set in the 'PACK_ WRITE' state remain set. The address buffer is not loaded and 
remains the same. 

err_flag - enable error logic/input 

sys_dp_ctrl_en - data path control to fill sequencer 
creq_lat_en - latch new CREQ 

CREQ ~ - STxC 

3. If hard_error,the next state is IDLE', the error is logged in BIU.STAT and BIU.ADDR, the 
c%cbox_h_err is asserted amd the "byte/word write" sequence is aborted. 
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13.7.24 SYS^WR 

At the first SYS.CLK rising edge on entry to 'SYS.WR' cReq_h<2:0> is driven with 

• WRITEJ3LOCK if entered from DISPATCH or TVR.CMP' 

• STxC if entered from 'BWR_SYS_JMERGE'. 

Also at SYS.CLK, cWMask<7:0> is driven from 

• LW_MASK<7:0> if not "PV" 

• WRITE.QUEUE BM<7:0> if "PV" 

If the write is for a "PV" system 

• Addr<3> indicates which QW in the OUTJBUF is to be written from the byte mask driven to 
cWMask<7:0> 

• dataWE_h<0> = X0 <- '1 if LW.MASK 0,2,4,6 was set previously at PACK.WRITE' 

• dataWE_h<l> = XI <- 1 if LW.MASK 1,3,5,7 was set previously at TACKJWRITE' 

1. If CACK s idle and not error, wait in 'SYSJWR'. 

err_fiag - enable error logic/input 

sys_dc_ctri_en - data path control to fill sequencer 
holc"_en - enable hold 

2. If CACK = OK, or STxC.FAIL and not bwr.chain, the next state is IDLE'. 

dispatch_f lag - enable dlspatch_in next access 

hold_er. - enable hold 

all_ehains_clr - clear all in progress state 

If CACK = STxC^FAIL and not bwr.chain, set bit of STxCJRESULT register to indicate 
write_unlock failure to microcode. 

3. If CACK = STxC.FAIL and bwr.chain, the next state is 'BWR.SYS.RD', retry RMW with 
LDxL. 

err_flag - enable err_in (cack - hard error) 

FILI_BWK_DIR - merge target QW from cache at end of next SYS_CLK cycle 

sys_dp_ctrl_en - data path control to fill sequencer 

crec_lat_en - latch new CREQ 

_ CREQ~ ~ - LDxl 

4. " If error < CACK not idle, OK, or STxCJFAIL), the next state is 'ERR'. If CACK =" soft error, the 

error is logged, c%cbox_s_err is asserted. If CACK = hard error, the error is logged, c%cbox_ 
h_err is asserted. 



13.8 CBOX Error Handling Summary 

The Error Handling logic asserts two signals to the MBOX ( C%CBOX_ECC„ERR, C%CBOX_ 
HARD_ERR) and two signals to the Interrupt Section (C%CBOX_S_ERR, C%CBOX_H_ERR). 
C%CBOX_ECC_ERR is set when a fill command sent to the MBOX is to be ignored. C%CBOX_ 
ECC_ERR is set when an ECC or parity error with fill data is detected. C%CBOX_ECC_ERR 
is also used for the non error purpose of cancelling a fill for a cache miss or stall. C%CBOX_ 
HARD_ERR causes the MBOX to end an I_MISS or DJvUSS fill sequence. C%CBOX_SJERR and 
C%CBOX_HJERR are asserted as a result of loading the error bits in the BIU_STAT register. 
C%CBOX_S_ERR is edge sensitive(a pulse is asserted) and C%CBOX_H_ERR is level sensitive 
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and remains asserted until the error bits in the BIUJ3TAT are cleared. A summary of the NVAX 
Plus CBOX error logic is show in Table 13-18. 



Table 13-18: NVAX Plus CBOX Error Handling 



Problem 



Situation 



ERR CTL 



ARB/TPR_CTL 



FELL 



Tag Parity Error 
or Tag Control 
Parity Error 



DREAD, IREAD 



mem WRITE 



Assert C%CBOX_S_ERR, Send I.CF or D_CF to MB OX Aborts due 
Command ARB to go to and abort. Latch appro- to MISS 
ERRROR state. Generate priate BIU.STAT bits 
C%CBOX_HARD_ERR when 
ARB Bend I_CF or D_CF 



Assert C%CBOX_H„ERR, 
Command ARB to Abort 



ARB Aborts. Latch ap- 
propriate BIU.STAT bits 



Aborts on 
a BYTE/WORD 
WRITE, not 
involved yet 
otherwise. 



Correctable ECC 
error 



Any Read, including 
I/O read 



Assert C%CBOX_S_ERR 



BYTE/WORD WRITE, Assert C%CBOX„S_ERR 
WRITE.UNLOCK, WRITE 



Latch appropriate BHJ_ 
STAT bits. Wait for Fill 
to complete. 



Latch appropriate BIU_ 
STAT bits. Wait for Fill 
to complete the MERGE. 



Assert C%CBOX_ 
ECCJERR, 
send cor- 
rected data 
toMBOX 

Continue the 
MERGE with 
corrected data. 



Uncorrectable ECCAny Read, including 

error or Parity I/O read 

Error 



Assert C%CBOX_S_ERR 



BYTE/WORD WRITE, Assert C%CBOX_H_ERR 
WRITE .UNLOCK, WRITE 



Latch appropriate BIU_ 
STAT bits. Wait for Fill 
to complete. 



Latch appropriate BIU_ 
STAT bits. Wait for Fill 
to signal complete. 



Assert C%CBOX^ 
ECC.ERR, 
send C%CBOX_ 
HARD_ERR 
along with 
I_CF or D_ 
CF. 

Abort Merge, 
restart ARB. 



cAck Hard Error 



Any READ, DREAD, Assert C%CBOX_S_ERR, 
DREADJO, DREAD. Command ARB to go to 
LOCK, IREAD. IREAD. ERRROR state, Generate 
10 C%CBOX.HARD.ERR when 

ARB send I.CF or D_CF 

Any Write, WRITE. Command ARB to, Abort, 
UNLOCK, WRITE, IO_ Assert C%CBOX_H„ERR 
WR UNLOCK 



Send I.CF or D.CF to MBOX Aborts due 
and abort. Latch appro- tocAckhard 
priate BIU.STAT bits error. 



Latch appropriate BIU_ 
STAT bits. ARB aborts. 



Aborts due 
to cAckhard 



error. 
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Table 13-18 (Cont.): NVAX Plus CBOX Error Handling 



Problem 



Situation 



ERR CTL 



ARB/IPR CTL 



FILL 



cAck Soft Error Any READ, includ- Assert C%CBOX_S_ERR 
ing I/O read 

Any WRITE, WRITE. Assert C%CBOX_S_ERR 

UNLOCK WRITE, 10 

WR.UNLOCK 



Latch appropriate BIU_ 
STAT bits. Wait for Fill 
to complete. 

Latch appropriate BIU_ 
STAT bits. Wait for Fill 
to complete the MERGE. 



Complete 
the FILL. 

Continue the 
MERGE with 
corrected data. 



13.9 Invalidates 

The external system logic is responsible for keeping the primary cache coherent. If the P cache is 
being allocatted as two way associative NVAX Plus asserts pMapWE_h <0> when filling Pcache set 
0 and pMapWE_h<l> when filling Pcache set 1 to support systems with backmaps. If the Pcache 
is being allocatted as direct mapped NVAX Plus asserts pMapWE_h<0> when filling Pcache. 

For two way associative operation pInvReq<0> indicates an entry in Pcache set 0 is to be invali- 
dated, while pInvReq<l> indicates an entry in Pcache set 1 is to be invalidated, where iAdr<ll:5> 
determines the index to be invalidated. 

In direct map mode pInvReq<0> and iAdr<12:5> indicate the entry to be invalidated. If iAdr<12> 
= '0 set 0 is invalidated at index = iAdr<ll:5>, and if iAdr<12> = '1 set 1 is invalidated at index 
= iAdr<ll:5>. 

Systems using two way associative allocation which do not backmap the Pcache issue invalidates 
to both sets of the Pcache when a block is displaced from the Bcache. The index to be invalidated 
is driven to iAdr<ll:5> and pInvReq<l:0> are both asserted. The MBOX modification for NVAX 
Plus allows invalidates the address in CM_0 UT_LATCH < 12 : 5 > , for set a single Pcache set as 
specified by CM_OUT_LATCH[TnvReq]. The CBOX sequences invalidates to set 0 in the first 
cpu_clk cycle of a system cycle, and to set 1 in the second cpu_clk cycle of a system cycle. 

The CBOX sources an invalidate when an IABORT is received and the ARB sequencer has already 
issued a pMapWE or read to the system which updates the Pcache backmap. Since the present 
entry in the Pcache may not be removed if an IABORT is detected in ARB states 'RDC, 'RDN', 
'SYS_RD', or 'FILL' it is necessary to invalidate the index which was to be allocated, since the 
backmap no longer contains this address. 

Systems which do not backmap that allocate the Pcache as two-way associative and therefore 
assert both pInnvReq<l:0> can not request invalidates in consecutive sys^clk cycles. 



13.10 Revision History 



Table 13-19: Revision History 

Who When Description of change 

Gil Wolrich 15-Nov-1990 NVAX PLUS release for external review. 
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Table 13-19 (Cont.): Revision History 



Who 


When 


Description of change 


Gil Wolrich 


30-Jan-1991 


remove vectors features. 


Gil Wolrich 


01~Aug-1991 


update 


Gil Wolrich 


21-Oct-1991 


update pMapWE tuning 
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Chapter 14 
Error Handling 



This chapter describes the NVAX Plus error exceptions and interrupts as seen from the macrocoder's 
point of view. It is organized with respect to the SCB vectors through which the event is dis- 
patched. The SCB layout and SCB vector format are described in the Architecture Summary 
chapter of the NVAX Plus chip specification. 

14.1 Terminology 



Term Meaning 

Pill Any quadword of data returned to the NVAX Plus chip in response to read-type 

operation. The quadword containing the requested data is a fill. 

Dirty In the Bcache, a bit is stored with each hexaword called the dirty bit. When set 

this bit indicates that memory does not have the updated data for this block. 

Flush Causing victim writebacks to memory of all dirty blocks in Bcache. 



14.2 Error handling Introduction and Summary 

This chapter discusses all levels of hardware and microcode-detected errors. Errors notification 
occurs through one of the following events, listed in order of decreasing severity. 

• Console error halt— A halt to console mode is caused by one of several errors such as Interrupt 
Stack Not Valid. For certain halt conditions, the console prompts for a command and waits 
for operator input. For other halt conditions, the console may attempt a system restart or a 
system bootstrap as defined by DEC Standard 032. The actual algorithms used are outside 
of the scope of this document. 

• Machine check — A hardware error occurred synchronously with respect to the execution of 
instructions. Instruction-level recovery and retry may be possible. 

• Hard error interrupt — A hardware error occurred asynchronously with respect to the execu- 
tion of instructions. Usually, data is lost or state is corrupted, and instruction-level recovery 
may not be possible. 

• Soft error interrupt — A hardware error occurred asynchronously with respect to the execution 
of instructions. The error is not fatal to the execution of instructions, and instruction-level 
recovery is usually possible. 
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• Kernel stack not valid — During exception processing, a memory management exception oc- 
curred while trying to push information on the kernel stack. 

This chapter explains in detail several of the SCB entry points. The purpose is to help the 
operating system programmer determine exactly what error occurred and to recommend an error 
recovery method. 

The following information is given in this chapter for each SCB entry point: 

• What parameters are pushed on the stack. 

• What failure codes are defined. 

• What additional information exists and should be collected for analysis. 

• How to determine what error(s) actually occurred. 

• How to restore the state of the machine, and what level of recovery is possible. 

Table 14—1 shows the general error categories associated with each of these error notifications. 



Table 14-1: Error Summary By Notification Entry Point 



Entry Point 


SCB Index 

(hex) 


General Error Categories 


Console Halt 


N/A 


Interrupt Stack riot valid, kernel-mode halt, 
double error, illegal SCB vector 


Machine Check 


04 


Memory management, interrupt, microcode detected CPU errors, 
CPU stall timeout, 

TB parity errors, VIC tag or data parity errors, 
Uncorrectable data read errors, 
CACK_HERR on read 


Soft Error 
Interrupt 


54 


VIC tag or data parity errors, 
Pcache tag or data parity errors, 
Bcache tag parity error on read, 
Uncorrectable data read errors 
Correctable data errors 


Hard Error 
Interrupt 


60 


Uncorrectable data errors on write operations, 

Bcache tag parity error on writes, 
CACK_HERR on writes 



14.3 Error Handling and Recovery 

All errors (except those resulting in console halt) go through SCB vector entry points and are han- 
dled by service routines provided by the operating system. A console halt transfers control to the 
address of the CONSOLE JKALT register. Software driven recovery or retry is not recommended 
for errors resulting in console halt. 

Software error handling (by operating system routines) can be logically divided into the following 
steps: 

• State collection. 

• Analysis. 
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• Recovery. 

• Retry. 

These steps are discussed in gesneral in the next four sections. After that, details are supplied on 
analysis, recovery and retry for each error event which results in an exception or interrupt. This 
information is organized by SCB entry point. 

14.3.1 Error State Collection 

Before error analysis can begin, all relevant state must be collected. The stack frame provides 
the PC/PSL pair for all exceptions and interrupts. For machine checks, the stack frame also 
provides details about the error. 

In addition to the stack frame, machine checks and hard and soft error interrupts usually require 
analysis of other registers. It is strongly recommended that all the state listed below be read 
and saved in these cases. State is saved prior to analysis so that analysis is not complicated by 
changes in state in the registers as the analysis progresses, and so that errors incurred during' 
analysis and recovery can be processed with that context. 

Ibox 

ICSR: Ibox (VIC) control and status register. 
VMAR: VIC memory address register. 

Ebox 

ECR: Ebox control and status register. 
Mbox 

TBSTS: TB status register. 
TBADR: TB address register. 
PCSTS: Pcache status register. 
PCADR: Pcache address register. 

Cbox 

BIU.STATi Bus or Fill error status. 

BC_TAG: Contains tag of tag_parity, control_parity, or fill error. 

BIU ADDR: Address associated with cache probe or bus error. (BIU.HERR, BIU_SERR, BC_ 
TPERR, BCJTCPERR) 

FILL. ADDR: Address associated with fill error, FILL.ECC or FILLJDPERR. 
FILL_S YNDROME : Syndrome bits associated with FILL.ADDR. 



NOTE 

The ERROR interrupt is level sensitive requiring the clearing of the external ERR_ 
H signal if the interrupt source is external to NVAX Plus, and the clearing of the 
BIU_STAT indication resulting in the internal H_ERR signal to clear the interrupt. 
The error bits in the BIUJ3TAT register are WlC, and therfore should be cleared 
after BIU_STAT is read, so that errors incurred during analysis and recovery can be 
processed with that context. 
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For the purposes of the rest of this chapter, it is assumed that each of these states is saved in a 
variable whose name is constructed by prepending "S„" to the register name. For example, the 
ICSR would be saved in the variable S_ICSR. 

The following example shows allocation of memory storage for the error state. 

; ERROR STATE COLLECTION DATA STORAGE 

; IBOX 

; IBOX VIC CONTROL AND STATUS REGISTER 
; IBOX VIC ERROR ADDRESS REGISTER 

;EBOX 

; EBOX CONTROL AND STATUS REGISTER 
;MBOX 

; TB STATUS REGISTER 
; TE ERROR ADDRESS REGISTER 
; PCACHE STATUS REGISTER 
; PCACHE ERROR ADDRESS REGISTER 

; CBOX 

; Bus or Fill error status 

; Contains tag of tao_parity, conrrol_paritj', or fill error 
; Address associated with BIU_HERR, Bl"ti_SERR, BC_TPERR, BC_TCPERR 
; Address associated with fill error, FILL_ECC or FILL_DP£RR 
; Syndrome bits associated with FILL_ADDR 

The following example shows collection of error state - which would normally be done early in the 
error handling routine. If a second bus or fill error is detected the SEO second error bit is set, 
but the error address and status are lost. 

;SAVE ALL ERROR STATE UPON ENTRY TO ERROR HANDLING ROUTINE 

SAVE_STATE : 

;CBOX 



MFPR 


#PR19S 


BIU_STAT,S_BIU STAT 




MFPR 


♦PRlSf" 


]biu~addr, s2biu_addr 




MFPR 


#PR19$" 


"fill_addr,I_fill addr 




MFPR 


#PR1SJ^ 


>ILL~SYNDROM£, S_FILL 


SYNDROME 


MFPR 


*PR19S~ 


]bc_tag, S_BC_TAG~ 


; IBOX 


MFPR 


#PR19S_ 


_ICSR, S_ICSR 




MFPR 


#PR1SS~ 


[VMAR, S_VMAR 


;EBOX 


MFPR 


#PR19S_ 


_ECR P S_ECR 


;MBOX 


MFPR 


*PR19S_TBSTS, S_TBSTS 




MFPR 


#PR19$~TBADR, S _ TBADR 




MFPR 


#PR19S_ 


PCSTS, S PCSTS 




MFPR 


#PR19$] 


[PCADR, SJPCADR 





; SYSTEM ENVIRONMENT 
COLLECTION OF SYSTEM ENVIRONMENT ERROR REGISTERS GOES HERE 

Additional state collection is recommended while/after flushing the B cache because certain errors 
may occur as a result of the flush operation. 

For the purposes of the rest of this chapter, it is assumed that each of these states is saved in a 
variable whose name is constructed by prepending "SS_" to the register name. For example, the 
BIU_STAT register would be saved in the variable SS_BIU_STAT. 



S_ICSR: 


.LONG 




SJVMAR: 


.LONG 




S_ECR: 


.LONG 




S TBSTS: 


.LONG 




S_TBADR: 


.LONG 




S PCSTS: 


.LONG 




S_PCADR: 


.LONG 




S BIU_STAT: 


.LONG 


0 


S_BC_TAG : 


.LONG 


0 


S_EIC_ADDR: 


• LONG 


0 


S FILL_ADDR: 


.LONC- 


0 


S FILL~S YNDROME: 


.LONG 


0 
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1 4.3.2 Error Analysis 

With the error state obtained during the collection process, the error condition can be analyzed. 
The purpose is to determine what error event caused the particular notification being handled (to 
the extent possible), and what other errors may also have occurred. Analysis of machine checks 
and hard and soft error interrupts should be guided by the parse trees given in the appropriate 
sections below. 

NOTE 

Errors detected in or by one of the caches usually result in the cache automatically 
being disabled. However, to minimize the possibility of nested errors, it is suggested 
that error analysis and recover} 7 for memory or cache-related errors be performed with 
the Pcache disabled and the B cache disabled (i.e. BIU„CTL<BC„ENA> = 0). 

NOTE 

Disabling the Bcache means clearing BIU_CTL<BC_ENA> . This only stops the NVAX 
Plus chip from probing external cache. System logic continues to allocate and writeback 
blocks for READJBLOCK and WRITE_BLOCK command requests. 

In some cases, a notification for a single error occurs in two ways. For example, an uncorrectable 
error in the Bcache data RAMs will cause a soft error interrupt and may also cause a machine 
check. **Software should handle cases where a machine check handler clears error bits and then 
the soft error handler is entered with no error bits set.** 

In general an error reporting register can report events which lead to machine check, soft error, 
or hard error. A given error event can result in machine check and soft error interrupt, or in 
just one or the other. Events which lead to hard error interrupts generally can not also cause 
machine check or soft error interrupt. However, if a hard error occurs from a write operation, a 
subsequent read error can result in a machine check with a SEO bit set. 

Multiple simultaneous errors may make useful recovery impossible. However, in cases where no 
conflict exists in the reporting of the multiple errors (i.e., separate Pcache and Bcache errors), 
and recovery from each error is possible, then recovery from the set of errors is accomplished by 
recovering from both of them. For example, recovery from a Pcache tag parity error and FILL 
correctable data error being reported together is possible by following the recovery procedures for 
each error in sequence. 

The error cause determination parse tree for machine check exception is directed at causes or 
possible causes of machine checks. It ignores errors which lead to hard or soft error interrupts 
but not to machine checks. Similarly, the hard error interrupt cause determination ignores 
errors which lead to machine check or soft error interrupt, and the soft error interrupt cause 
determination ignores errors which lead to machine check or hard error interrupt. 

There is a natural order between machine check, hard error interrupt, and soft error interrupt 
because the IPL for hard error interrupts is higher than that of soft error interrupts and the IPL 
in the machine check exception is higher than either of the error interrupts. This hierarchy is 
important because knowledge of which notification event occurred is used to discriminate between 
certain error events (e.g., an error on the initial fill quadword for a read-lock is distinguished from 
a fill error on a subsequent quadword by the fact of machine check notification). 
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14.3.3 Error Recovery 

Recovery from errors consists of clearing any latched error state, repairing damaged state (if 
necessary and possible), and restoring the system to normal operation. There are special consid- 
erations involved in analysis and recovery from cache or memory errors, which are covered in the 
next sections. 

Recovery from multiple error scenarios is possible when there is no conflict in the error regis- 
ters which report the errors and there is no conflict in the recovery procedures for the errors. 
However all recovery procedures in this chapter assume that only one error is present. None of 
the procedures are valid in multiple error scenarios without further analysis. 

In some instances, it may be desirable to stop using the hardware which is the source of a large 
number of errors. For example, if a cache reports a large number of errors, it may be better to 
disable it. It is suggested that software maintain error counts which should be compared against 
error thresholds on every error report. If the count (per unit time) exceeds the threshold, the 
hardware should be disabled. 

14.3.3.1 Special Considerations for Cache and Memory Errors 

Cache and memory error recovery requires special considerations: 

• Cache and memory error recovery should always be done with the Pcache and VIC off. 

• B cache flush should be always be done one block at a time, recapturing the relevant error 
registers between each block flush. 

• Cache coherence requires a specific procedure for re-enabling the caches. See Section 14.3 .3 . 1.1, 
Cache Coherence in Error Handling. 

• Error recovery should be performed starting with the most distant component and working 
toward the CPU and Ebox. System environment memory errors should be processed first, 
B cache tag store and data RAM errors, Pcache errors, TB errors, and, finally, VIC errors. 

• BIU and FELL errors are cleared by writing the write-one-to-clear bits in BIU_STAT. 

• Pcache tag and data store errors are cleared by writing the write-one-to-clear bits in PCSTS. 
The suggested way to do this is to write a one to the specific error bit. Pcache flush is necessary 
after Pcache tag store parity errors. See Section 14.3.3.1.1.1, Cache Enable, Disable, and 
Flush Procedures. 

• TB errors are- cleared by writing the write-one-to-clear bits in TBSTS. The suggested way to 
do this is to write a one to the specific error bit. 

• PTE read errors are cleared by writing the PTE error write-one-to-clear bits in PCSTS. The 
suggested way to do this is to write a one to the specific error bit. 

• VIC errors are cleared by writing the write-one-to-clear bits in ICSR. The suggested way 
to do this is to write a one to the specific error bit. VIC flush and re-enable is necessary 
after VIC tag store parity errors. See Section 14.3.3.1.1.1, Cache Enable, Disable, and Flush 
Procedures. 
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14.3.3.1.1 Cache Coherence in Error Handling 

Certain procedures must be followed in order to maintain cache coherence while enabling NVAX 
caches. Since many errors cause caches to be disabled, and since cache and memory error recovery 
is normally done with the Pcache and VIC off, the complete cache enable procedure is done as 
part of recovery from all cache and memory errors. 

The VIC (virtual instruction cache) is not automatically kept coherent with memory. It is flushed 
as a side effect of the REI instruction (as required by the VAX architecture). Normally in error 
recoverj', there is no definite need to flush the VIC. For consistency and for the sake of beginning 
error retry in a known state, flushing the VIC during error recovery is recommended. However, 
in the event of VIC tag parity errors, the complete VIC flush procedure described in the next 
section must be done. 

The TB is not automatically kept coherent with memory. Software uses the TBIS and TBIA 
functions to maintain coherence, and the LDPCTX instruction clears the process PTEs in the 
TB. Normally in error recovery, there is no definite need to flush the TB. For consistency and 
for the sake of beginning error retry in a known state, flushing the TB during error recovery is 
recommended. When a TB parity error occurs, Mbox hardware flushes the TB by itself (via an 
internally generated TBIA), but it would be appropriate for software to test the TB after a parity 
error. This is discussed in Section 14.3.3.1.2. 

14.3.3.1.1.1 Cache Enable, Disable, and Flush Procedures 

To enable the NVAX Plus caches, the caches are flushed and enabled in a specific order. The 
ordering is necessary for coherence between the B cache, Pcache, and memory. For simplicity, one 
procedure is given for enabling the NVAX Plus caches, even though variations on the procedure 
may also produce correct results. Disabling the caches can be done in any order, though one 
procedure is given here. 

In error handling, the VIC and Pcache are disabled. 

14.3.3.1.1.1.1 Disabling the NVAX Plus Caches for Error Handling 

This is the procedure for disabling the NVAX Plus caches: 

NOTE 

These procedures will be supplied with MACRO coding examples. 

• Disable the VIC: 

TBS (MTPR to ICSR) 

• Disable the Pcache: 

TBS (MTPR to PCCTL) 

• Disable the Bcache: 

TBS (MTPR to BIU_CTL) 
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14.3.3.1 .1 .1 .2 Enabling the NVAX Caches 

The procedure for enabling the NVAX caches after an error is the same as is used to initialize the 
caches after power-up. This procedure ensures that error retry/restart occurs with the caches in 
a known state. The procedure is outlined below. 

• The caches must all be disabled and the Bcache must be disabled. 

• Flush the Bcache 

• Enable the Bcache (MTPR to BIU.CTL). 

• Flush the Pcache (Loop on MTPR to PCTAG IPRs). 

• Enable the Pcache (MTPR to PCCTL). 

• Flush the TB: 

MTPR #0, #PR1SS_TBIA 

• Flush the VIC (Loop on MTPRs to VMAR and VTAG, writing different initial values into the 
left and right banks). 

• Enable the VIC (MTPR to ICSR). 

14.3.3.1.1.2 Extracting Data from the Bcache 

To extract data from the Bcache, the Bcache is placed in FORCE_HIT mode. 

After the Bcache is flushed, set the Bcache in FORCE_HIT mode and extract the data. Note that 
the code which executes this procedure and its local data must be in 10 space. The TB entries 
(PTEs) which map this code and local data must be fixed in the TB. (This is most easily done 
by flushing the TB via an MTPR to TBIA and then accessing all the relevant pages in pages in 
sequence.) Otherwise Bcache FORCE_HIT will interfere with instruction fetch, operand access, 
and PTE fetches in TB miss sequences. 

The following instruction places the Bcache in FORCE_HTT mode: 

TBS (MTPR to BIU_CTL) 

With the Bcache in FORCE_HIT mode, a read in memory space of any address whose index portion 
matches the index of the cache data will return the data (provided there is no uncorrectable data 
RAM error). This is most easily accomplished by reading from the true address of the data. 

NOTE 

In FORCEJ3IT mode, Fill ECC errors are detected. **(unless a DIAG_CTL<DISABLE_ 
ERRORS> function is enabled)** Software should prepare for an ECC error (BIU.STAT 
<FILLJSCC>). 

14.3.3.1.2 Cache and TB Test Procedures 

TBS 

OUTLINE OF TO-BE-SPECIFIED TEST PROCEDURES 

Testing is generally done using the force hit mode of a cache. The code and data of 
the test procedure must reside in 10 space. Assuming memory management is enabled 
during this procedure, the needed PTEs must be in the TB before entering force hit 
mode in the Pcache or Bcache. For the Bcache, testing should be done with errors 
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disabled. **(DIAG_CTL<DISABLEJERRORS> enabled)** The ECC logic should be 
tested thoroughly on one location by forcing various check bit patterns and examining 
the syndrome latched on the read (**FILL_SYNDROME** is loaded on every read in 
Bcache disable-errors mode). Presently FILL_SYNDROME is valid if an error occurs 
and the syndrome bits for the last fill can not be recovered with an IPR_RD of this 
register ohterwise. Pcache and VIC parity checking should be tested by writing bad 
parity into the arrays. TB testing may be accomplished by writing to MTBTAG and 
MTBPTE (with care to not change any TB entry necessary for the test code and data 
and not to cause two TB entries to exist for one address). PROBER and PROBEW 
(setting PSL<PRV_MOD>) are then used to verify the protection bits. Testing the 
modify bit would be difficult, though approaches exist. 

1 4.3.4 Error Retry 

Error retry is a function of the error notification (machine check or error interrupt), error type, 
and error state. The sections below specify the conditions under which the instruction stream 
may be restarted. 

If retry is to be attempted, the stack must be trimmed of all parameters except the PC/PSL pair. 
This is necessary only for machine checks, because error interrupts do not provide any additional 
parameters on the stack. An REI will then restart the instruction stream and retry the error. 
Some form of software loop control should be provided to limit the possibility of an error loop. 
Note that pending error interrupts may be taken before the retry occurs, depending on the IPL 
of the interrupted or machine checked code. 

Strictly speaking, an REI from a hard or soft error interrupt handler is not a retry since these 
interrupts are recognized between macroinstructions. A machine check exception is an instruction 
abort, and an REI from the handler will cause the failing instruction to be retried (provided retry 
is indicated by analysis). What these cases all have in common is that the interrupted instruction 
stream is restarted. This is only done when the result of error analysis and recovery is such that 
all damaged state has been repaired and there is no reason to suspect that incorrect results will 
be produced if the image is restarted and another error does not occur. 

If complete recovery from one or more errors is not possible (i.e., some state is lost or it is 
impossible to determine what state is lost), possibly the entire system will have to be crashed, a 
single process will have to be deleted, or some other action will have to be taken. Software must 
determine if the error is fatal to the current process, to the processor, or to the entire system, 
and take the appropriate action. 

It is expected that software handles machine checks, soft error interrupts, and hard error inter- 
rupts independently. For example, after handling a machine check from which retry is to occur, 
software does not check for errors which might cause a pending hard or soft error interrupt. Since 
the HARD ERROR interrupt is level sensitive the machine check code must not clear BIU_STAT 
if the interrupt is to be taken. The machine check handler is exited via REI (after trimming the 
machine check information off the stack). If the IPL of the machine checked instruction stream 
is low enough, any pending hard or soft error interrupt is taken before the retry occurs. However, 
if the interrupted instruction stream was running at high IPL, then it will continue oblivious of 
remaining errors. 
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14.3.4.1 General Multiple Error Handling Philosophy 

Multiple errors may be reported at the same time. In some cases the NVAX Plus pipeline will 
contain multiple operand prefetches to the same memory block. This can cause multiple errors 
from a single non-transient failure. It could also occur that two separate errors occur at nearly 
the same time and are thus reported simultaneously. 

Multiple error scenarios may be grouped into the following three classes: 

1. Multiple distinct errors for which no error report interferes with the analysis of any other 
(e.g., no lost error bits set). 

2. Multiple errors which could have been caused by the NVAX Plus pipeline issuing more than 
one reference to a given block before the error interrupt or machine check forced a pipeline 
flush. 

3. Multiple errors for which analysis is complicated because the reports interfere with each 
other. 

It is the intent of this chapter to recover from class 1 (above) by simply treating the errors as 
separate and recovering from each in turn. Retry or restart evaluation is based on the cumulative 
result of the recovery and repair procedures for each error. 

For class 2, specific cases are identified in which lost errors are tolerated. These cases are selected 
because the NVAX Plus pipeline can easily cause them (given one error), and because sufficient 
safeguards exist to ensure that correct operation is maintained. 

NOTE 

Note: If BIU_STAT<lost_write_err> is clear and BIU_STAT<FILL„SEO> is set with 
ARB_CMD being a read, then write data has not been lost, the system can be retried 
after the cache is flushed. 

Class 3 scenarios are generally not considered recoverable. The system is simply crashed in those 
cases. 
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14.4 Console Halt and Halt Interrupt 

A console halt is not an exception, but rather a transfer of control by the NVAX Plus microcode 
directly into console macrocode at the the address of the Console_Halt IPR. Console halts are 
initiated at powerup, by certain microcode-detected double error conditions, and by the assertion 
of the external halt interrupt pin, HALT_H. 

There is no exception stack frame associated with a console halt. Instead, the SAVPC and SAVPSL 
processor registers provide the necessary information. The format of SAVPC (IPR 42) is shown 
in Figure 14—1. 



Figure 14-1: Console Saved PC 



31 30 29 28127 26 25 24123 22 21 20|16 16 1" 16115 14 13 12111 10 OS 08107 06 05 04103 02 01 00 
I Saved PC I : SAVPC 



The PSL, halt code, MAPEN<0>, and a validity bit are saved in SAVPSL (IPR 43). The format 
of SAVPSL is shown in Figure 14—2. The halt codes are shown in Table 14—2. 

Figure 14-2: Console Saved PSL 



31 30 29 26127 26 25 24 123 22 21 20119 16 17 16115 14 13 12 111 10 06 08107 06 05 04103 02 01 00 
I- PSL<31:16> I | | Halt Code | PSL<7:0> ! :SAVPSL 

I I 

" MAPEN<0> — + | 
Invalid SAVPSL if 1 — + 



The possible halt codes that may appear in SAVPSL<13:8> are listed in Table 14-2. 



Table 14-2: ( 


Console Halt Codes 




Mnemonic 


Code (Hex) 


Meaning 


ERRHLTPIN 


02 


HALT_H pin asserted 


ERR.PWRUP 


03 


Initial power up 


ERR„INTSTK 


04 


Interrupt stack not valid 


ERR_DOUBLE 


05 


Machine check during exception processing 


ERR_HLTINS 


06 


HALT instruction in kernel mode 


ERR_ILLVEC 


07 


Illegal SCB vector (bits <1:0> * 11) 


ERR.WCSVEC 


08 


WCS SCB vector (bits <1:0> = 10) 


ERR.CHMFI 


OA 


CHMx on interrupt stack 


ERRJE0 


10 


ACV/TNV during machine check processing 


ERRJE1 


11 


ACV/TNV during kernel-stack-not-valid processing 
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Table 14-2 (Cont.): Console Halt Codes 



Mnemonic Code (Hex) 



Meaning 



ERR_IE2 
ERRJE3 



13 



12 



machine check during machine check processing 

machine check during kernel-stack-not^ valid process- 
ing 

PSL<26:24> = 101 during interrupt or exception 
PSL<26:24> = 110 during interrupt or exception 
PSL<26:24> = 111 during interrupt or exception 
PSL<26:24> = 101 during REI 
PSL<26:24> = 110 during REI 
PSL<26:24> = 111 during REI 
Microcoded powerup selfbest failed 



ERR_IE_PSL_26_24_101 
ERR_IE_PSL_26_24_110 
ERR_IE_PSL_26_24_ 111 

ERR_RELPSL_26_24_101 
ERR_REI_PSL_26_24_ 110 
ERR_REI_PSL_26_24_111 
ERR_ SELFTE ST_FATLED 



3F 



19 



IE 



ID 



1A 



IB 



IF 



At the time of the halt, the current stack pointer is saved in the appropriate IPR (0 to 4), 
and SAVPSL<31:16,7:0> are loaded from PSL<31:16,7:0>. SAVPSL<15> is set to MAPEN<0>. 
SAVPSL<14> is set to 0 if the PSL is valid and to 1 if it is not (SAVPSL<14> is undefined after 
a halt due to a system reset). S AVPSLk 1 3 : 8 > is set to the console halt code. 

To complete the hardware restart sequence and thereby pass control to the console macrocode, 
the state shown in Table 14—3 is initialized. 



Table 14-3: 


CPU State Initialized on Console Halt 


State 


Initialized Value 


SP 


IPR 4 (IS) 


PSL 


O41F0O00 (hex) 


PC 


from CONSOLE.HALT IPR 


MAPEN 


0 


ices 


0 (after reset, code=3, only) 


SISR 


0 (after reset, code=3, only) 


ASTLVL 


4 (after reset, codes3, only) 


PAMODE 


0 (after reset, codes3, only) 


BPCR<31:16> 


FECA(hex) (after reset, code=3, only) 


CPUID 


0 (after reset, codes3, only) 


all else 


undefined 
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14.5 Machine Checks 



The machine check exception indicates a serious system error. Under certain conditions, the error 
may be recoverable by restarting the instruction. The recoverability is a function of the machine 
check code, the VAX Restart bit (VR) in the machine check stack frame, the opcode, the state of 
PSL<FPD>, the state of certain second-error bits in internal error registers, and most probably, 
the external error state. 

A machine check results from an internally detected consistency error (e.g., the microcode reaches 
an "impossible* state), or a hardware detected error (e.g., an uncorrectable FILL_ECC error on a 
data read). 

A machine check is technically a macro instruction abort. The NVAX Plus microcode attempts to 
convert the condition to a fault by unwinding the current instruction, but there is no guarantee 
that the instruction can be properly restarted. As much diagnostic information as possible is 
pushed on the stack and provided in other error registers. The rest of the error parsing is then 
left to the operating system. 

When the software machine check handler receives control, it must explicitly acknowledge receipt 
of the machine check with the following instruction: 

MTPR #0, #PR1S£_MCESR 

14.5.1 Machine Check Stack Frame 

The machine check stack frame is shown in Figure 14—3. The fields of the stack frame are 
described in Table 14—4, and the possible machine check codes are listed in Table 14—5. The 
contents of all fields not explicitly defined in Table 14-4 are UNDEFINED. 



Figure 14-3: Machine Check Stack Frame 



31 30 29 28127 26 25 24 122 22 2a 20 IIS 16 17 16115 14 13 12 111 10 09 06107 06 05 04103 02 01 00 







24 (byte 


count of parameters, not including this 


longword) 




1 


1 AST1VL I 


x x 


x x x. 1 


Machine Check Code 1 x x x x x x 
INT. SYS register 


X x 1 


CPUID 


1 
1 



SAVEPC register 



VA register 









C register 




1 


1 pjl 


1 X 


xlMode I 


Opcode I x x 3 
PC 


: x : 


: x x x|VR| x x x x x x x 



PSL 



31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08 1 07 06 05 04 103 02 01 00 
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Table 14-4: 


Machine Check Stack Frame Fields 


Longword 


Bits 


Contents 


(SP)+0 


v$l:U 


Byte count — This longword contains the size of the stack frame in bytes, not 
including the PC, PSL, or the byte count longword. Stack frame PC and PSL 
values should always be referenced using this count as an offset from the stack 
pointer. 


(SP)+4 




AoII/VLr— Ihis neia contains the current value oi the VAa. AblLvL register. 




23:16 


Machine check code— This longword contains the reason for the machine check, 
as listed in Table 14—5. 




7:0 


CPUID— This field contains the current value of the VAX CPUID register. 


(SP>+8 


31:0 


INT. SYS register — This longword contains the value of the INT. SYS register 
and read onto the Abus by the microcode. The fields in this register are de- 
scribed in the Interrupt Section chapter of the NVAX Plus chip specification 
Chapter 10 of the NVAX Plus chip specification. 


(SPH12 


31:0 


SAVEPC — This field contains the SAVEPC register which is loaded by microcode 
with the PC value in certain circumstances. It is used in error handling for PTE 
read errors with PSL<FPD> Bet in this stack frame. 


(SP)+16 


31:0 


VA register — This longword contains the contents of the Ebox VA register, which 
may be loaded from the output of the ALU. 


(SP)+20 


31:0 


Q register— This longword contains the contents of the Ebox Q register, which 
may be loaded from the output of the shifter. 


CSPH24 


31:28 


Rn — This field contains the value of the Rn register, which is used to obtain the 
register number for the CVTPL and EDIV instructions. In general, the value 
of this field is UNPREDICTABLE. 




25:24 


Mode— This field contains a copy of PSL<CUR_MOD>. 




23:16 


Opcode — This field contains bits <7:0> of the instruction opcode. The FD bit is 
not included. 




7 


VR — This field contains the VAX Restart bit, which is used to communicate 
restart information between the microcode and the operating system. If this 
bit is set, no architectural state has been changed by the instruction which was 
executing when the error was detected. If this bit is not set, architectural state 
was modified by the instruction. 
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Table 14-5: Machine Check Codes 

Mnemonic Code (Hex) Meaning 



MC.HK_UNKNOWN_MSTATUS 


01 


Unknown memory management fault parameter re- 
turned by the Mbox (see Section 14.5.2.1) 


MCHK_INTJD„VALUE 


02 


Illegal interrupt ID value returned in INT. SYS (see 
Section 14.5.2.2) 


MCHK_CANT_GET_HERE 


OS 


Illegal microcode dispatch occurred (see Section 14.5.2.3) 


MCHK.MOVC.STATUS 


04 


Illegal combination of state bits detected during string 
instruction (see Section 14.5.2.4) 


MCHK_ASYNC_ERROR 


05 


Asynchronous hardware error occurred (see Section 14.5.2.5) 


MCHK.SYN C_ERROR 


06 


Synchronous hardware error occurred (see Section 14.5.2.6) 



14.5.2 Events Reported Via Machine Check Exceptions 

This section describes all the errors which can cause: a machine check exception. A parse tree is 
given which shows how to determine the cause of a given machine check. After that, there is a 
description of each error. For each error, the recovery procedure is given. Where appropriate, the 
conditions for retry are given. See Section 14.3.3 and Section 14.3.4 for more on error recovery 
and error retry. 

Figure 14--4 is a parse tree which should be used to analyze the cause of a machine check excep- 
tion. The errors shown in the parse tree are described in detail in the sections following the figure. 
The section is indicated in parenthesis with each error. Note that it is assumed that the state be- 
ing analyzed is the saved state, as described in Section 14.3.1. Otherwise the state could change 
during the analysis procedure, leading to possibly incorrect conclusions. (See Section 14.3.2 for 
general information about error analysis.) 
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Figure 14-4: Cause Parse Tree for Machine Check Exceptions 

MACHINE CHECK 

■ -+ (select one) 



MCHK UNKNOWN MSTATUS 



MCHK INT. ID VALUE 



MCHK CANT GET HERE 



MCHK MOVC. STATUS 



I MCHK_ASYNC_ERROR 

■+.-—-4 (select all, at least one) 



S_TBSTS<LOCK> 
——4 (select all) 



S TBSTS<DPERPJ> 



I S TBSTS<TPERR> 



I none of the above 



S ECR<S3 STALL TME0UT> 



none of the above 



MCHK_SYNC_ERROR 
— h (select all, at least one) 

I 

I S_ICSR<LOCK> 

i -4 (select all, at least one) 

I I 

I I S ICSR<DPERR0> 



S ICSR<TPERR0> 



I 

I S ICSR<DPERR1> 



I S ICSR<TPERR1> 



none of the above 



->-Unfcnown memory management status error (Section 14 . 5 .2 . 1 ) 
-> Illegal interrupt ID error (Section 14. 5.2 .2) 



■> Presumed impossible microcode address reached 
(Section 14.5.2.3) 

-> MOVCx status encoding error (Section 14 . 5 .2 .4) 



■> TB PTE data parity error (Section 14 . 5 .2 . 5 . 1 ) 
•> TB tag parity error (Section 14.5.2.5.1) 



•> Inconsistent status (no TBSTS error bits set) 
(Section 14. 5.2. "7) 

-> S3 stall timeout error (Section 14 .5.2 .5.2) 



-> Inconsistent status (no asynchronous machine check error bit 
set) (Section 14.5.2.7) 



■> VIC (virtual instruction cache) data parity error in bank 0 
(Section 14. 5.2. CD 

•> VIC tag parity error in bank 0 (Section 14 . 5.2 . 6. 1) 
■> VIC data parity error in bank 1 (Section 14.5.2.6.1) 
-> VIC tag parity error in bank 1 (Section 14. 5.2. 6.1) 



■> Inconsistent status (no ICSR error bits set) 
(Section 14.5.2.7) 



Figure 14—4 Cont'd on next page 
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S_BIU STAT<FILL_ECC> AND 
not E~BI U _STAT<TILL_CRD> AND 
NOT S_PCSTS<PTE_ER>~ 
— — -+ (select one) 



£ BIS STAT<ARB CMD>-R£AD 



S B1V STAT<ARB CMD>» not READ 



--> uncorrectable ECC error on read 
(Section 14.5.2.6.2) 

-> logged error is from previous write 
(Section; 14. 5.2. 6.3) 



£_BIU ETAT<FILL_ERR> AND 
not £~BIU_STAT<CRD> AND 
E_PCSTS<PTE_ER> 1 
— + (select one) 



•+ (select one) 



£ BID £TAT<ARB CMD>-READ 



£ Bltf £TAT<ARB CMD>- not READ 



£ BID STAT<FXLL SEO> AND 



•> Uncorrectable ECC error on PTE read 
(Section: 14. 5.2 . 6. 1 .2) 

■> logged error is from previous write 
(Section 14.5.2.6.3) 



-> Lost Fill error on PTE Read 
(Section 14.5.2.6.4) 



£ Bit 1 ETAT<BIU_HERR or TPERR or TPCERR> 
NOT £~PCSTS<PTE ER> 



+ — . (select one) 

! 

I E BID STAT<ARB CMD> - READ 



— ____....__._._..> r e a d error (cAck K_ERR or Tag/CTL parity) 
I (Section 14. 5. 2.6.5") 

I 

I S_BIU_STAT<ARB_CMD> - not READ 

4— — — — — — — — — — — > logged error is from previous write 

(Section 14.5.2.6.3) 

S_BIU_STAT<BXU_HERR or TPERR or TPCERR> 
NOT S~PCSTS<PT£_ER> 

— — 4 (select one) 

I 

I £_BIU_STAT<ARB_CMD> - READ 

r e a d error (cAck K_ERR or Tag/CTL parity) 
I (Section 14.5.2 .6.5~ 

I 

I S_BXU_STAT<ARB_CMD> - not READ 

+ _-„. — , logged error is from previous write 

(Section 14.5.2.6.3) 



£ BID STAT<BIU SE0> AND 



-> Lost BIU error 

(Section 14.5.2.6.6) 



none of the above 



Figure 14-4 Cont'd on next page 
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Figure 14-4 (Cont.): Cause Parse Tree tor Machine Check Exceptions 



otherwise 



■> Inconsistent status (no cause found for synchronous machine che 
(Section 14.5.2.7) 

■> Inconsistent status (unknown machine check code) 
(Section 14.5.2.7) 



Notation: 

(select one) 



(select 
(select 



all) 
all. 



at least 



otherwi 
none of 



se 

the above 



Exactly one case must be true. If zero or more than one is 
true, the status is inconsistent. 
More than one case may be true. 

Ail the cases are possible causes of a particular machine check. 
More than one may be true. At least one must be true or the status 
is inconsistent. A case is not considered true if it evaluates tc 
"Not a machine check cause" . 

fall- through case for (select one.) if no other case is true, 
fall-through case for (select all) or (select all, at least one) 
if no other case is true. 



14.5.2.1 MCHK_UNKNOWN__MSTATUS 

Description: An unknown memory management status was returned from the Mbox in response 
to a microcode memory management probe. This is probably due to an internal error in the Mbox, 
Ebox, or microsequencer. 

Recovery procedures: No explicit error recovery is required in response to this error. 

Retry condition: This error can only happen in microcode processing of memory management 
faults for a virtual memory reference. Retry if: 

CVR m 1) OR (PSL<FPD> = 1). 



14.5.2.2 MCHK_INT.ID_VALUE 

Description: An illegal interrupt ID was returned in INT. SYS during interrupt processing in 
microcode. This is probably due to an internal error in the interrupt hardware, Ebox, or microse- 
quencer. 

Recovery procedures: No explicit error recovery is required in response to this error. 

Retry condition: This error can only happen in microcode processing of interrupts which occurs 
between instructions or the middle of interruptable instructions. Retry if: 

(VR s 1) OR (PSL<FPD> * 1). 



l 

At least one potential PTE cause must be found or the status is inconsistent (see 
Section 14.5.2.7). 

Some of the outcomes indicate a 

potential synchronous machine check cause which is not a potential PTE read error cause. These errors should be treated 
separately. 

1 At least one potential PTE cause must be found or the status is inconsistent (see Section 14.5.2.7). Some of the outcomes 
indicate a potential synchronous machine check cause which is not a potential PTE read error cause. These errors should 
be treated separately. 
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14.5.2.3 MCHK_CANT_GET_HERE 

Description: Microcode execution reached a presumably impossible address. This is probably 
due to a microcode bug or an internal error in the Ebox or microsequencer. 

Recovery procedures: No explicit error recovery is required in response to this error. 

Retry condition: Retry if: 

(VR = 1) OR (PSL<FPD> = 1). 

14.5.2.4 MCHKJ/IOVC.STATUS 

Description: During the execution of MOVCx, the two state bits that encode the state of the 
move (forward, backward, fill) were found set to the fourth (illegal) combination. This is probably 
due to an internal error in the Ebox or microsequencer. 

Recovery procedures: No explicit error recovery is required in response to this error. 

Retry condition: Because the state bits encode the operation, the instruction can not be 
restarted in the middle of the MOVCx. If software can determine that no specifiers have been 
over-written (MOVCx destroys R0-R5 and memory due to string writes), the instruction may be 
restarted from the beginning hy clearing PSL<FPD>. This should be done only if the source and 
destination strings do not overlap and if: 

(PSL<FPD> * 1). 

14.5.2.5 MCHK_ASYNC_ERROR 

This machine check code repoits serious errors which interrupt the microcode at an arbitrary 
point. Many internal machine states (e.g., bits in the PSL, the PC or SP) are questionable. 
Recovery is typically not possible. 

14.5.2.5.1 TB Parity Errors 

Description: Parity errors in tags and PTE data in the TB cause an asynchronous machine 
check by directly forcing a microtrap in the microsequencer. The reference being processed by 
the Mbox may be for an explicit Ebox reference, an operand prefetch or DEST_ADDR reference 
from the specifier queue, or an instruction prefetch from the IREF latch. Also the reference could 
be a read generated by the Mbox within a TB miss for a process space virtual address since 
process page tables are stored in virtual memory (system space). 

Description (TB PTE Data Parity Error): A parity error in the PTE data portion of a TB 
entry which hit had a parity error. 

Description (TB Tag Parity Error): A parity error in the tag portion of a TB entry which hit 
had a parity error. 

Recovery procedures: lb recover, clear TBSTS<LOCK>. 

Retry condition: Since the Ibox is nearly always able to issue instruction prefetches, TB parity 
errors could occur at practically any time. This makes it impossible to determine what machine 
state is incorrect. There is no guarantee that all writes with a different PSL<CUR_MOD> com- 
pleted successfully. Therefore even the stack frame PSL<CURJMOD> can't be used to determine 
whether system data is uncorrupted. 
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So 



retry is not possible. Crash the system. 



NOTE 



At this time, a change is being considered in REI (for reasons unrelated to TB parity 
errors) which might guarantee that the stack frame PSL<CUR_MOD> value is correct 
for TB parity errors. This would mean that if a given TB parity error occurs in user 
mode, for example, that writes from higher privilege modes must have completed suc- 
cessfully. In other words, in the event of a TB parity error, it would be known that 
all pages protected from writes at the stack frame privilege mode were uncorrupted. 
Software could kill all jobs which had access to the potentially corrupted pages instead 
of crashing the system. (This might be most feasible for processes incurring TB parity 
errors in USER mode.) 



14.5.2.5.2 Ebox S3 Stall Timeout Error 

Description: S3 stall timeout errors occur when the Ebox microcode is stalled waiting for some 
result or action which will probably never occur. S4 stalls in the Ebox cause S3 stalls and therefore 
can lead to S3 stall timeout. Additionally, field queue stall and instruction queue stall can cause 
this timeout. (These last two situations are not Ebox pipeline stalls, but they are similar in 
effect.) The timeout can occur in an3 r microflow for a number of reasons. Machine state may be 
corrupted. This timeout is probably due to an internal error in NVAX Plus such that one box is 
waiting for another to do something which it isn't going to do. An example would be if the Ebox 
microcode expected one more source specifier than the Ibox delivered. The Ebox will stall until 
the timeout occurs waiting for the Ibox to deliver one more source operand via the source queue. 

S3 timeout errors can be caused by failures of various pipeline control circuits in the Ebox. Also 
a deadlock within a box or across multiple boxes can cause this error. 

Recovery procedures: Tb recover, clear the S3J3TALLJITMEOUT bit in ECR. 

Retry condition: Because this error can occur at any time, it is not possible to determine what 
machine state is incorrect. Also, this error should never happen and indicates either a serious 
failure in the chip. So retry is not possible. Crash the system. 

14.5.2.6 MCHK_SYNC_ERROR 

This machine check code reports errors which occur in memory or 10 space instruction fetches or 
data reads. Except in the case of PTE read errors, core machine state should be consistent since 
microcode has to explicitly access an operand or instruction in order incur this error. Microcode 
does not access memory results or dispatch for a new instruction execution with core machine 
state in an inconsistent state. 

PTE read errors on write transactions can cause a microtrap at an arbitrary time, and so core 
machine state may be inconsistent. 

Many of the error events described below for synchronous machine check are possible causes. If 
more than one is present, there is no way to determine which actually caused the machine check. 
If exactly one possible cause is discovered, then the machine check may be attributed to that cause. 
The reason multiple causes may be present is that the NVAX Plus chip prefetches instructions 
and data. If the CPU branches or takes an exception before using data it has requested, then 
the pending machine check is taken as a soft error interrupt (though it might not be recoverable 
in the final analysis). 
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If multiple errors occur, recovery and retry may be possible. It is recommended that retry from 
multiple errors be done only if one error report does not interfere with analysis of, and recovery 
from, another error. 

If two errors are entirely separate, neither interfering with the analysis and recovery of the 
other, then it is acceptable to retry from these errors provided all the error analyses and recovery 
procedures result in a retry indication. 

In several cases, lost errors are tolerated. In each case, the strong tendency to prefetch data 
exhibited by the NVAX PLUS pipeline makes the particular lost error likely, given that one error 
of that kind occurred. Also, in each case, if data is lost in the lost error, a hard error interrupt 
is posted. So these errors are tolerated as long as they do not cause a hard error interrupt. The 
BIU_STAT<lost i _write_err> bit is maintained to report errors on write operations have occurred 
which are not recorded. If BIU„STAT<lost_write_,err> is set the H„ERR interrupt is asserted. 

Errors in opcode or operand specifier fetching are always detected before architecturally visible 
state within the CPU is modified. This means the VR bit from the machine check stack frame 
should be 1. This error handling analysis attempts to recover from multiple errors, so the retry 
condition for each error is made as general as possible. If the machine check handler finds only 
errors of the kind listed here, then VR should be 1 and it is an inconsistent report if it is not (see 
Section 14:5.2.7). 

• VIC parity errors. 

• uncorrectable ECC FILL errors in I-stream reads. 

• CACK H_ERR in I-stream reads. 

14.5.2.6.1 VIC Parity Errors 

Description: A parity error was detected in the VIC tag or data store in the Ibox. VIC parity 
errors cause a machine check when the Ebox microcode requests dispatch to a new instruction 
execution micron ow or attempts to access an operand within an instruction execution microfiow. 

VIC Data Parity Errors: A parity error occurred in data bank 0 (DPERRO) or data bank 1 
(DPERRl)oftheVIC. 

VIC Tag Parity Errors: A parity error occurred in tag bank 0 (TPERRO) or tag bank 1 (TPERR1) 
of the VIC. 

In all cases, the quadword virtual address of the error is in VMAR. 
Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures: lb recover, disable and flush the VIC by re-writing all the tags (using 
the procedure in Section 14.3.3.1.1.1). Also, clear ICSR<LOCK>. 

Retry condition: Retry if: 

(VR = 1) OR (PSL<FPD> * 1). 
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14.5.2.6.2 FILL Uncorrectable ECC Errors 

Description (uncorrectable ECC errors): An uncorrectable data error was detected by the 
Cbox in an I-stream or D-stream read fill. Uncorrectable data errors are the result of a multiple 
bit error in the data read from the Bcache or supplied by the system on a READ_BLOCK. 

Description (all cases): S_FILL_ADDR contains the address of the error, and S_FILL_ 
SYNDROME contains the syndrome calculated by the ECC logic. 

Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures (uncorrectable ECC errors): lb recover, clear BIUJ3TAT<FILL„ 
ECC>. 

Recovery procedures : Flush the Bcache. 

Retry condition: If no writeback error occurs in the Bcache flush, retry if: 

(VR = 1) OR (PSL<FPD> = 1). 

If a writeback error occurs in the Bcache flush, then the data is presumed to be unrecoverable. 
Given that the address is available (no error in the tag store), software should determine if the 
error is fatal to one process or the whole system and take appropriate action. Otherwise, crash 
the system. 

14.5.2.6.3 FILL/BIU write error 

The error reported in BIU_STAT was not on a bus read cycle and is not the cause of the machine 
check. Fill_seo or biu_seo should be set, and this error may be the machine check cause. Refer 
to (Section 14.5.2.6.4) for Lost Fill errors and to (Section 14.5.2.6.6) for Lost BIU errors. 

14.5.2.6.4 Lost Fill Error 

Description: Some fill errors were not latched because a previous fill error wag reported in the 
BIU_STAT. If the reported error is not a read, a fill error while merging write data from a write 
has been logged. The logged error is not the cause of the machine check, but the fill_seo might 
be. A hard error should be pending if the reported error was not correctable. If the reported error 
is a read or a correctable fill error and lost_write is not set, the error causing fill_seo to set may 
be the cause of the machine check, and can be retried unless the aborted instruction has altered 
essential state. 

If SJ?CSTS<PTE_ER> is set refer to (Section 14.5.2.6.7) on PTE read errors. 

Lost fill errors may be caused by more than one operand prefetch to the same cache block. 

Recovery for lost fill errors depends on whether the pending interrupt is a hard or soft error inter- 
rupt. The machine check error handling software should defer recovery until the expected hard or 
soft error interrupt occurs. Once the interrupt is taken, the error recovery and restart instructions 
found in the hard error interrupt and soft error interrupt sections should be referenced. 

Software should employ some mechanism to record that an interrupt for a lost fill error is pending. 
This mechanism should allow detection of a case in which an expected interrupt does not occur 
(once IPL is lowered). If the expected interrupt does not occur when IPL is lowered, then a serious 
inconsistency exists and the system should be crashed. 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 
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Recovery procedures: No sj>ecinc recovery action is required. Note that BIU„STAT<FILL_ 
SEO> is not cleared. It will be cleared by the hard or soft error interrupt handler. 

Retry condition: Retry only if: 

(VR = 1) OR (PSL<FPD> > 1). 

14.5.2.6.5 BIU_HERR 

Description: An I-stream or D-stream read returned CACK_BDERR the system environment or 
did not complete due to a tag or tag control parity error. 

I-stream errors cause a machine check when the Ebox microcode requests dispatch to a new 
instruction execution microflow or attempts to access an operand within an instruction execution 
microflow. 

D-stream read errors cause a machine check when the Ebox microcode accesses prefetched 
operand data or when the Mbox returns data tagged with an error indication to the Ebox register 
file. 

D-stream ownership read errors cause a machine check when the Ebox microcode accesses 
prefetched operand data. 

Pending Interrupts (all cases): A soft error interrupt should be pending. 
Recovery procedures (all cases): Clear BIU„STAT<BIU„HERR>. 
Retry condition: Retry if: 

(VR b 1) OR (PSL<FPD> « 1). 

14.5.2.6.6 Lost Fill Error 

Description: Some fill errors were not latched because a previous BIU error was reported in 
the BIU_STAT. If the reported error is not a read, a: fill error while merging write data from a 
write has been logged. The logged error is not the cause of the machine check, but the BIU_seo 
might be. A hard error should be pending. If the reported error is a read and lost_write is not 
set, the error causing biu_seo to set may be the cause of the machine check, and can be retried 
unless the aborted instruction lias altered essential state. 

- If S._PCSTS<PTE_ER> is set refer to (Section 14.5.2.6.7) on PTE read errors. 

Lost biu errors may be caused by more than one operand prefetch to the same cache block. 

Recovery for lost biu errors depends on whether the pending interrupt is a hard or soft error inter- 
rupt. The machine check error handling software should defer recovery until the expected hard or 
soft error interrupt occurs. Once the interrupt is taken, the error recovery and restart instructions 
found in the hard error interrupt and soft error interrupt sections should be referenced. 

Software should employ some mechanism to record that an interrupt for a lost biu error is pending. 
This mechanism should allow detection of a case in which an expected interrupt does not occur 
(once IPL is lowered). If the expected interrupt does not occur when IPL is lowered, then a serious 
inconsistency exists and the system should be crashed. 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 
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Recovery procedures: No specific recovery action is required. Note that BIU_STAT<FILL_ 
SEO> is not cleared. It "will be cleared by the hard or soft error interrupt handler. 

Retry condition: Retry only if: 

(VR = 1) OR (PSL<FPD> = 1). 

14.5.2.6.7 PTE read errors 

The following sections describe error handling for PTE read errors. PTE read errors are read 
errors which happen in reads issued by the Mbox in handling a TB miss. Handling of these errors 
is different from handling the same underlying error (BIU.HERR, BC_TPERR, BCJTCPERR, 
FILLJBCC) when PTE read isn't the cause. 

If S_PCSTS<PTE_ER> is set, then a PTE read issued by the Mbox in processing a TB miss had 
an unrecoverable error. The TB miss sequence was aborted because of the error. The original 
reference can be any I-stream or D-stream read or write. If the original reference was issued by 
the Ebox, then the PTE read which incurred the error will have been retried once (because of a 
special hardware/microcode mechanism for handling PTE read errors on Ebox references). 

PTE read errors are difficult to analyze, partly because the read error report in the Cbox does 
not directly indicate that the failing read was a PTE read. Because of this and because FIE read 
errors should be rare (a very small percentage of the reads issued by the Mbox are PTE reads), 
multiple errors which interfere with the analysis of the PTE error are not considered recoverable. 

The mechanism for reporting PTE read errors on Ebox references involves the Mbox forcing the 
Ebox (via a microtrap) into the microcode routine which normally handles memory management 
faults. This routine probes the address of the original reference, effectively retrying the failing 
PTE read. Assuming the error is not transient, the probe by microcode will cause a machine check. 
If the error does not occur on the probe, microcode restarts the current instruction stream. So 
machine checks caused by PTE read errors can easily occur with the particular PTE read error 
having occurred twice (with a lost error bit set in the relevant Cbox error register). The analysis 
here tolerates these particular multiple error reports and allows retry in those cases, provided 
the remainder of the error analysis indicates retry is appropriate. (Note that there is no way to 
tell from the information available to the machine check handler whether the original reference 
was an Ebox or Ibox reference.) 

If the reference which incurs the PTE read error is a write, S_PCSTS<PTE_ER_WR> will be set. 
In this case the original write is lost. No retry is possible partly because the instruction which 
took the machine check may be subsequent to the one which issued the failing write. Also, PTE 
read errors on write transactions can cause a machine check at an practically arbitrary time in 
a microcode flow, and core machine state may not be consistent. 

14.5.2.6.7.1 PTE Read Errors in Interruptable instructions 

Another special case associated with PTE read errors exists for interruptable instructions (specifi- 
cally CMPC3, CMPC5, LOCC, MOVC3, MOVC5, SCANC, SEPC, and SPANC). For these instruc- 
tions, if the PTE read error occurred for an Ebox reference, the PC in the machine check stack 
frame points to the instruction following the interrupted instruction. In this case, the SAVEPC 
element in the machine check stack frame is the PC of the interrupted instruction. However in 
all other cases, SAVEPC is UNPREDICTABLE. This case is not considered recoverable because 
analysis of the error information can not unambiguously conclude that this case is present. To 
tell that this case might be present, the error handler examines the FPD bit in the PSL in the 
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machine check stack frame. If FPD is set in the stack frame (in the case of a PTE read error) 
then one of the following is true: 

• One of the interruptable instructions listed above incurred the PTE read error. In this case, 
SAVEPC from the machine check stack frame points to the interrupted instruction, and PC 
in the stack frame points to the next instruction. 

• An REI instruction loaded a PSL with FPD set and a certain PC. The Ibox incurred the PTE 
read error in fetching the opcode pointed to by that PC. In this case, the PC in the stack 
frame points to the instruction which was the target of the REI and SAVEPC from the stack 
frame is unpredictable. 

It is not possible to determine with certainty which of the two above cases is the cause of a machine 
check with S_PCSTS<PTE_ER> set and stack frame PSL<FPD> set. Retry is not possible since 
software can not tell which PC to restart with. However, software may wish to probe the location 
pointed to by the PC in the stack frame, expecting a possible machine check as a result. If 
a machine check does occur, that is information indicating that the second case occurred (not 
totally unambiguously, of course). A very good guess may be made by a person examining the 
error report if the machine check stack frame and the result of this probe is available in the 
report. 

14.5.2.6.7.2 Uncorrectable ECC FILL Errors and on PTE Reads 

Description (uncorrectable ECC errors): A FILL uncorrectable data error was detected by 
the Cbox in a PTE read. Uncorrectable data errors are the result of a multiple bit error in the 
data read from the Bcache, of FILL from the system on a READJ3LOCK or LDxL. 

Description (all cases): S_FILL_ADDR contains the cache address of the error, and FILL_ 
SYNDROME contains the syndrome calculated by the ECC logic. (If the physical address is 
found to be in 10 space, it is an inconsistent status. See Section 14.5.2.7.) 

S_BIU_STAT<FILL_SEO> may be set. This error is probably due to the same PTE error occurring 
more than once. This is an acceptable assumption unless a hard error interrupt occurs after 
handling this error. 

Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures (uncorrectable ECC errors): To recover, clear BIU_STAT<FILL_ 

ECC>. 

Recovery procedures (both cases): Flush the Bcache. Clear PCSTS<PTE_ER>. 
Retry condition: If no writeback error occurs in the Bcache flush, retry if: 

(VR b 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). 

If 

(PSL<FPD> = 1) OR (S_PCST3<PTE_ER_WR> = 1), 
crash the system. If a writeback error occurs in the Bcache flush, then the data is presumed to be 
unrecoverable. Software must determine if the error is fatal to one process or the whole system 
and take appropriate action. 
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14.5.2.6.7.3 CACK_HERR on PTE Read 

Description: A PTE read returned CACK.HERR. 

S_BIU_STAT<BIU_SEO> may be set. This error is probably due to the same PTE error occurring 
more than once. This is an acceptable assumption unless a hard error interrupt occurs after 
handling this error. 

Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures: Clear BIU_STAT<CACK_HERR> . Clear PCSTS<PTE_ER>. 

Retry condition: Retry if: 

CVR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). 
Otherwise, crash the system. 

Post Retry Recovery: If the same fill error recurs on retry, then the block is probably "lost". 
In this case the more general sense of "lost" is implied. Software must determine if the error is 
fatal to one process or the whole system and take appropriate action. 

NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then retry once more. 

14.5.2.7 Inconsistent Status in Machine Check Cause Analysis 

Description: A presumed impossible error report was found in the error registers. This could 
be due to a hardware failure or bug, or to incomplete analysis in this spec. 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 

Recovery procedures: No specific recovery action is called for. 

Retry condition: No retry is possible. The integrity of the entire system is questionable. Crash 
the system. 
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14.6 Hard Error Interrupts 

Hard error" interrupts are requested to report an error that was detected asynchronously with 
respect to instruction execution. This results in an interrupt at IPL ID Chez) to be dispatched 
through SCB vector 60 (hex), 'typically, these error indicate that machine state has been corrupted 
and that retry is not possible. 

The stack frame for a hard error interrupt is shown in Figure 14—5. 
Figure 14-5: Hard Error Interrupt Stack Frame 

31 30 29 28127 26 25 24 |23 22 21 20I1& 18 17 16 1 15 14 13 12111 10 06 08 107 06 05 04 1 03 02 01 00 

4— +—4—+-— +--+--+-•»+-«•+-- +--+•--+--■+--■+--+-•-+—-+-■•"+-•- +--4 

I PC I : (SP) 

4--+--^--^--+--4--4>--+--+--4.--+--+--+--4---4--+--+^ 

I PSL ! 



14.6.1 Events Reported Via Hard Error Interrupts 

This section describes all the errors which can cause a hard error interrupt. 
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Figure 14-6: Cause Parse Tree for Hard Error Interrupts 

HARP ERROR INTERRUPT 

— r (select all, at least one) • 

I 

I (status consistent with hard error interrupt 
I in system environment error registers) 

— — „ — ____„_________> Hard error interrupt from system environment ■ 

I (Section 14.6.1.2) 

I 

I BIU_STAT<lost_write_err> 

^_ — — — - — — — — > uncorrectable ECC error on a write from Mboi: 

I (Section 14. 6.1,1) 

I 

I BIU_STAT<BIU_HERR> and BIU_STAT<BITI_CMD> - WRITE 

+ — — — — — ■ — — > System failure (timeout) on a write from Mbox 

I (Section 14.6.1.1) 

I 

I EIU_STAT<BC_TPERR> and BItf_STAT<BIU_CMD> - WRITE 

+ — _ — - — - — --> Bcache tag parity error on a write from Mboy. 

I (Section 14.6.1.1) 

I BIi:_£TAT<BC_TCPERR> and BIP_STAT<BIU_CMDT - WRITE 

+-- — - — — — ■ —————> Bcache tag control parity error on a write from Mbox 

I (Section 14.6.1.1) 

I 

I Bir_STAT<FILL_ECC> and not BIO STAT<CKD> 
I and BIV_£TAT<ARB_CtO> - WRITE 

+ — — — — > uncorrectable ECC error on a write from Mbox 

I (Section 14.6.1.1) 

I 

I otherwise 

+-- — — . — — — — — — — > inconsistent status (Section 14.6.1.3) 

Notation: 

(select all, at least one) - All the cases are possible causes of a hard error interrupt. 

More than one may be true. At least one must be true or the status 
is inconsistent. 



14.6.1.1 Uncorrectable Errors During Write or Wrlte-Unlock Processing 

Description: In processing a write or write- unlock, the Cbox detected a CACK = HERR from 
the system, a tag parity error, a control parity error, or an uncorrectable ECC error on the data 
read which is to be merged Data from the write is lost. 

Uncorrectable ECC errors indicate that two or more bits of the stored data quadword have 
changed and the error correcting code can not correct the data. The write merge sequence is 
aborted. 

Recovery procedures : The data in this block is lost. 

Restart condition : If the address of the data is available and no unexpected writeback errors 
occurred during the Bcache flush, software must determine if the lost data is fatal to one process 
or the whole system and take the appropriate action. 

14.6.1.2 System Environment Hard Error Interrupts 

TBS. 
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14.6.1.3 Inconsistent Status in Hard Error interrupt Cause Analysis 

Description: A presumed impossible error report was found in the error registers. This could 
be due to a hardware failure or bug. 

Recovery procedures: No specific recovery action is called for. 

Restart condition: No retry is possible. The integrity of the entire system is questionable. 
Crash the system. 
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14.7 Soft Error Interrupts 

Soft error interrupts are requested to report errors which were detected, but did not affect in- 
struction execution. This results in an interrupt at IPL 1A (hex) to be dispatched through SCB 
vector 54 (hex). 

The stack frame for a soft error interrupt is shown in Figure 14—7. 



Figure 14-7: Soft Error Interrupt Stack Frame 



31 30 29 26127 26 25 24123 22 21 20119 16 17 16115 14 13 12111 10 09 08107 06 05 04 103 02 01 00 

+ + * + + + + + +— H~- + + 1— + + + + +— + + +— I + + +—■ I- +— + + +— + + 

I PC I : (SP) 

I PSL I 



14.7.1 Events Reported Via Soft Error Interrupts 

This section describes the errors which can cause a soft error interrupt. 

Note that many errors which cause a soft error interrupt may also lead to a machine check 
exception. For this reason, a soft error interrupt with no apparent cause is not an inconsistent 
state unless the CPU has executed an instruction while IPL was lower than 1A (hex) since the 
most recent machine check exception. 

When a soft error interrupt is the only notification for any memory read error which could cause 
a machine check, the error didn't cause a machine check for one of the following reasons. 

• The error did not occur on the quadword the Ebox or Ibox requested (Pcache fill error). 

• The Ebox took an interrupt before accessing an instruction or operand which was prefetched 
by the Ibox. (It could be this soft error interrupt.) 

• A prefetched instruction or operand belonged to an instruction following a mispredicted 
branch, so the Ebox never executed the instruction (and it was flushed from the pipeline 
when the branch mispredict was recognized). 

• The Ebox took an exception for a different reason before attempting to use an instruction 
execution dispatch or access an operand prefetched by the Ibox. (The pipeline was flushed 
because of the exception.) 
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Figure 14-8: Cause Parse Tree for Soft Error Interrupts 

SOFT ERROR INTERRUPT 

-r (select all, at least one) 

I 

! S_ICSR<LOCK> 

+•—--+ (select all, at least one) 

I I 

I I S_ICSR<DPERR0> 

I + — _....._._.._......._.__._..._.....__..._.-.> vie (virtual instruction cache) data parity error in bank 

I I (Section 14. "7. 1.1) 

I I £_ICSR<TPERR0> 

: + _. — — > vie tag parity error in bank 0 (Section 14.7.1.1) 

II- 

I I S_ICSR<DPERR1> 

I + — - ■— — — -> VIC data parity error in bank 1 (Section 14. 7. 1.1) 

I I 

I | S_ICSR<TPERR1> 

I + ~_, — , — , > vie tag parity error in bank 1 (Section 14 . 7 . 1 . i ) 

I I 

I I none of the above ' • 

I + — - — -..-.-...............—....._......._...> inconsistent status (no ICSR error bits set) 

I 

1 £_PCSTS<LOCK> 

— f (select all, at least one) 

I i 

i I E_PCSTS<DPERR> 

I Pea che. data parity error (Section 14 . 7 .1 .2 ) 

! I 

i I S_PCSTS<RIGHT__BANK> 

I ~ _„ — ,__..__«, _ — _„ > f> ca che: tag parity error in right bank 

I | (Section 14.7.1.2") 

I I S_PCSTS<LEFT_BANK> 

I 4........_...__............_...____.-..._.....> Pcache tag parity error in left bank 

I l (Section 14.7.1.2) 

I I otherwise 

I + - — ......_.......__._...-.........-..> inconsistent status (no PCSTS error bits set) 

I 

v 
1 



Figure 14-8 Cont'd on next page 
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Figure 14-8 (Cont.): Cause Parse Tree for Soft Error Interrupts 



EIU_STAT<iost_write_err> 

— — — — — —————— — — — > £ write error occurred after the E ERR 



I S_PCSTS<PTE_ER_WR> 

+ Z — ■ > hard error on a PTE DREAD for WRITE or WRIT£_UNXOCK 

I {Section 14.6.1.1) 

I 

I not S_PCSTS<PTE_ER_WR> 
+-— j. 

I I 

I ! BIU_STAT<EIU_HERR> and BIU_£TAT<BIU_CMD> - READ 

I «. — — > hard error from system on read 

I i 

I i 

I : Bir_ETAT<SIU_SERR> 

I * — — — - — — — — — _ > soft error from system 

! ! (LASER/PVK do not issue cack £_£RR) 

I ! 

I ! BIU_STAT<BC_TPERR> and BItj_STAT<E Itt_CMD> - READ 

I — — — — ————————> tag parity error on read 

I I 

I I 

I ; BIU_STAT<BC_TCPERR> and BIU_STAT<BIU_CKD> - READ 

I — — — — — — — — — — — — — _> tag control parity error on read 

II 

I i - 

I I BIV_STAT<riLL_ECC> and BID_STAT<CRD> 

I + —————————_—————> correctable ECC error on fill or write merge 

I I 

I I 

I ! BIU_STAT<FILL_ECC> and not BIC_ETAT<CRD> and BIU_STAT<ARB_CMD> - READ 

I ■+■- — ——————————— > uncorrectable ECC error on fill 

I (Section 14. -.1.3) 

I 

I none of the above 

-i. _ — - — — — — — — — — .... — > inconsistent status 



Notation: 

(select one) 



(select 
(select 



all) 
all, 



otherwi 
none of 



at least one) 



se 

the above 



Exactly one case roust be true. If zero or more than one is 
true, the status is inconsistent. 
More than one case may be true. 

All the cases are possible causes of a soft error interrupt. 
More than one may be true. At least one roust be true or the status 
is inconsistent. A case is not considered true if it evaluates to 
"Not a soft error interrupt cause". 

fall-through case for (select one) if no other case is true, 
fall-through case for (select all) or (select all, at least one) 
if no other case is true. 



14.7.1.1 VIC Parity Errors 

Description: A parity error was detected in the VIC tag or data store in the Ibox. 

VIC Data Parity Errors: A parity error occurred in data bank 0 (DPERRO) or data bank 1 
(DPERRl)oftheVIC. 
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VIC Tag Parity Errors: A parity error occurred in tag bank 0 (TPERRO) or tag bank 1 (TPERR1) 
of the VIC. 

In all cases, the quadword virtual address of the error is in S„VMAR. 

Recovery procedures: To recover, disable and flush the VIC by re-writing all the tags (using 
the procedure in Section 14.3.3.1.1.1). Also, clear IGSR<LOCK>. 

14.7.1.2 Pea che Parity Errors 

Description: A parity error was detected in the Pea che. Either a tag parity error or a data 
parity error is reported, though tag parity errors in both the left and right banks may be reported 
simultaneously. The reference, whether it was a read or write, was passed to the Cbox as if the 
Pcache had missed. No data is lost. The Pcache is disabled because PCSTS<LOCK> is set. 

S_PCADR contains the physical address of operation incurring the error. The address should not 
be in 10 space. If it is, it is an inconsistent status. 

Recovery procedures: Clear PCSTS<LOCK>. Flush the Pcache and initialize the Pcache tag 
store. 

14.7.1.3 FILL Uncorrectable ECC Errors on I -St re am or D-Stream Reads 

Description (uncorrectable ECC error): A Fill uncorrectable ECC error was. detected by the 
Cbox in an I-stream or D-stream read. Uncorrectable data errors are the result of a multiple bit 
errors in the data read. 

Description : S_FXLL_ADDRESS contains the address of the error, and S_FILL_SYNDROME 
contains the syndrome calculated by the ECC logic. (If the physical address is found to be in 10 
space, it is an inconsistent status. 

Recovery procedures: To recover, clear BIU J3TAT<FILL_ECC> . 

Flush the Bcache. **(BC_TAG CAN BE USED TO DETERMINE IF THE FILL IS FROM 
BCACHE)** If the data is DIRTY in the Bcache and if the error repeats itself (is not transient), 
then a writeback error will result from the flush procedure. 

Restart Conditions: If a writeback error occurs in the Bcache flush, then the data is presumed 
to be unrecoverable. Software must determine if the error is fatal to one process or the whole 
system and take appropriate action. 

If the address of the error in the flush is not the same as that of the original error, this is a 
multiple error case in the data RAMs and is a serious failure. Crash the system. 

PTE read errors are difficult to analyze, partly because the read error report in the Cbox does 
not directly indicate that the failing read was a PTE read. Because of this and because PTE read 
errors should be rare (a very small percentage of the reads issued by the Mbox are PTE reads), 
multiple errors which interfere with the analysis of the PTE error are not considered recoverable. 

If the reference which incurs the PTE read error is a write, S_PCS TS <PTE_ER_ WR> will be set. 
In this case the original write is lost. No retry is passible partly because the instruction which 
took the machine check may be subsequent to the one which issued the failing write. Also, PTE 
read errors on write transactions can cause a machine check at an practically arbitrary time in 
a microcode flow, and core machine state may not be; consistent. 
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Restart condition: If no writeback error occurs in the Bcache flush, restart if: 

(S_PCSTS<PTE_ER_WR> = 0). 

If 

(S_PCSTS<PTEJER_WR> = 1), 

crash the system. 

If a writeback error occurs in the Bcache flush, then the data is presumed to be unrecover- 
able, (software must determine if the error is fatal to one process or the whole system and take 
appropriate action). Clear PCSTS<PTEJER>. 

Restart condition: Restart if: 

(S_PCSTS<PTE_ER_WK> = 0). 

Otherwise, crash the system. 

14.7.1.3.1 Multiple Errors Which Interfere with Analysis of PTE Read Error 

Because PTE read errors lead to several unusual cases, restart is not recommended in the event 
that other errors cloud the analysis of the PTE read error. 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 
Recovery procedures: No specific recovery action is called for. 
Restart condition: No restart is possible. Crash the system. 
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14.8 Kernel Stack Not Valid Exception 

A Kernel Stack Not Valid Exception occurs when a memory management exception is detected 
while attempting to push information on the kernel stack during microcode processing of another 
exception. Note that a console halt with an error code of ERR_INTSTK is taken if a memory 
management exception is encountered while attempting to push information on the interrupt 
stack. 

The Kernel Stack Not Valid exception is dispatched through SCB vector 08 (hex) with the stack 
frame shown in Figure 14-9. 

Figure 14-9: Kernel Stack Not Valid Stack Frame 



31 30 29 26127 26 25 24 123 22 21 20116 16 17 16115 14 13 12 111 10 0& 08107 06 05 04|03 02 01 00 

.--4—- 1— n-— —!——)—■(-— +—.4—-+--+- — +—.+--»— - — k-_ + ,-« + --4 

PC j : (SP ; 

PEL I 



DIGITAL CONFIDENTIAL 



Error Handling 14-35 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 

14.9 Error Recovery Coding Examples 

To be supplied. 

14.10 Revision History 



Table 14-6: Revision History 



Who 


When 


Description of change 


Mike Uhler 


19-Dec-1989 


Update for second-pass release. 


John Edmondson 


30-Jun-1990 


Update further after internal review and resolution of many issues. 


Gil Wolrich 


20-Feb-1991 


Modify, for NVAX Plus. 


Gil Wolrich 


Ol-Aug-1991 


-update 
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Chapter 15 
Chip Initialization 



15.1 Overview 

This chapter describes the hardware initialization process for the NVAX Plus chip. The hardware 
and microcode start the initialization, and then if not SROM_FAST, the 8K bytes of data are read 
from the Serial Rom and loaded into the Pcache. If SROM_FAST microcode passes control to 
macrocode at address E0040000. 

Much of the job of initialization involves setting the NVAX internal processor registers (IPRs) 
to a known state, or using "NVAX IPRs to perform functions such as cache initialization. See 
Chapter 2 for a list of the NVAX IPRs. Also, see the individual box chapters for a more in depth 
definition of many of the IPRs. 

15.2 Hardware/Microcode initialization 

The NVAX Plus Chip hardwares initializes to the following state on powerup or the assertion of 
chip reset: 

1. The VIC, Pcache, and B cache are disabled. 

2. The RLOG is cleared. 

3. The Fbox is disabled. 

4. The microstack is cleared. 

5. The Mbox and Cbox are reset, and all previous operations are flushed. 

6. The Fbox is reset. 

7. The Ibox is stopped, waiting for a LOAD PC. 

8. All instruction and operand queues are flushed. 

9. All MD valid bits are cleared, and all Wn valid bits are set. 

10. A powerup microtrap is initiated which starts the Ebox at the label IE.POWERUP. 

The NVAX Plus Chip microcode at IE.POWERUP then does the following: 

1. Hardware interrupt requests are cleared. 

2. BILLSTAT is cleared. 

3. BIILCTL is cleared. PV mode is the default. 

4. ICCS is cleared. 
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5. SISR<15:1> is set to 0. 

6. ASTLVL is set to 4. 

7. The Mbox PAMODE IPR is set to 30-bit physical address mode. 

8. CPUID is set to 0. 

9. The BPCR branch history algorithm is reset to the default value. 

10. Backup PC is retrieved from the Ibox and saved in SAVPC. 

11. PME is cleared. The performance monitoring counters are cleared. 

12. The current PSL, halt code, and value of MAPEN are saved in SAVPSL. 

13. MAPEN is cleared (memory management is disabled). 

14. All state nags are cleared. 

15. PSL is loaded with 041FOOOO. 

16. PCSTS is cleared. 

17. If not SROM FAST load P cache from the Serial Rom 

18. If SROM FAST the PC is loaded with E0040000 

The powerup microcode provides a means for loading start-up code from the serial ROM. This 
microcode could also be used for loading the burn-in and life- test programs. The P-cache is loaded 
with bit-serial instruction stream data. 

o Enable serial ROM this will also tell C-box we ere reading 

the serial ROM. 
c Chech SROM_FAS? bit, if set go to serial ROM fast code, 
o Begin normal serial ROM read and P-cache load, enable P-cache 
loop: o Assert serial line out high for e minimum of 200ns 
c Assert serial line out low fox a minimum of 200ns 
c Read data from serial line in and append value onto I-stream data, 
o If 1-stream data » 32 bits, then write into P-cache, VA - VA + 4. 
o If every 8th longword written then write new tag date 

for the next P-cache tag. 
o If I-stream data « 32K bits, then switch P-cache banKs. 
o If I-stream data - 64K bits, then go to exit: 
o Go to ioop: 

exit: o Write address of power up code to console halt reg. 
o disable SROM, loin console code to load PC. 

o PC is loaded with beginning address of SROM code that was loaded into 
the P-cache . 

NOTE: 

The serial ROM fast code does nothing except load the 

console halt register with what would be the start-up address of 

the SROM code and joins the console halt flow to load the value 

in that register as the next PC and jump tc it. The P-cache is 

disabled. 

On normal serial ROM loading, the P-cache is enabled for I-stream, 
E—stream, and parity error detection, All tags have been initialized 
and force hit in not enabled. Again the console halt register is 
loaded with E0040000, which is the beginning of where the SROM code was 
loaded. This value is used for the start PC. 
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15.3 Console initialization 

The console macrocode has the job of filling the gap between the initialized state described above 
and the initial state needed for the operating system. To that end, the console code does the 
following: 

1. Set CPUID to the correct value from the system environment. 

2. Set ECR (Ebox Control Register) as follows: 

1. Set FB OX_ENABLE to enable the Fbox. 

2. Set S3_TTMEOUT_EXT as required by the system environment. 

3. Set FBOX_ST4J3YPASSJENABLE to enable Fbox stage 4 bypass. 

4. Write one to S3_STALL_TJMEOUT to clear any error. 

3. Set ICSR (Ibox Control Status Register) as follows: 

1. Clear ENABLE to leave the VIC disabled. 

2. Write one to LOCK to clear any error. 

4. Set the PAMODE register MODE bit as required: by the system. 

5. Set up BIU_CTL (B cache/System Control) as required by the system. 

15.4 Other initialization 

Either the console code or the operating system will do the following final initialization steps 
(code examples are given): 

1. Initialize the VIC 

VIC_MAX INDEX :« 3E0 (hex) 
VIC_INDEX_STEP r- 20 (hex; 
VIC~TAG_INIT 0 

FOR INDEX !■ 0 TO VIC_MAX_IND£X BY VIC INDEX STEP DO 
BEGIN 

MTPR INDEX, VMAR 

MTPR VIC_TAG_INIT,VTAG 

END; 

2. Enable the VIC 

MTPR ENABLE, ICSR 

3. Initialize the Pcache, Enable the Pcache. The P cache is initialized by microcode if not SROM 
FAST. 

4. Initialize the Bcache 

5. Enable the Bcache, set BIU.„CTL[0] 
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15.5 Revision History 



Table 1 5-1 : Revision History 



Who 


When 


Description of change 


Debra Bernstein 
Jim Ellis/Gil Wolrich 


9-May-1990 
15-JAN-1991 


Initial edit 

NVAX Plus release for external review 
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Chapter 16 

Performance Monitoring Facility 



1 6.1 Overview 

The NVAX CPU chip contains a facility by which privileged software may obtain performance in- 
formation about the dynamic behavior of the CPU. The facility is implemented with a combination 
of hardware and microcode, and controlled by software using privileged instructions. 

Two 64-bit performance counters called PMCTRO and PMCTR1 are maintained in memory for 
each CPU in the system. The lower 16 bits of each counter are implemented in hardware in the 
CPU, and at specified points, microcode updates the quad words in memory with the contents of 
the hardware counters. 

The performance monitoring facility may be configured by privileged software to count a number 
of events in the system, from which performance analysis data such as cache and TB hit rates, 
cycles-per-instruction, and stall frequencies may be calculated. 

16.2 Software Interface to the Performance Monitoring Facility 

The performance monitoring facility makes use of a data structure in memory, and must be 
configured and enabled via a location in the System Control Block, processor register references, 
and the LDPCTX instruction. 

1 6.2.1 Memory Data Structure 

The two 64-bit performance counters for each CPU are maintained in a data structure in memory. 
This data structure consists of a pair of quad words for every CPU in the system. The physical 
address of the base of the data structure is obtained from offset 58 (hex) in the System Control 
Block. The format of this location is shown in Figure 16—1. 
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Figure 16-1: Performance Monitoring Data Structure Base Address 



31 30 29 28 IT"? 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 06 08 107 06 05 04|03 02 01 00 
I Physical Address of Performance Monitoring Date Structure ISB2 0 1 II :SCB+58(hex) 



NOTE 

An quadword-aligned physical base address is constructed by clearing the lower 3 bits 
of the longword fetched from offset 58 (hex) in the SCB. Microcode will not update 
the block in memory unless bits <2:0> of this longword contain 011 (binary). If these 
bits are found to contain another value, a machine check with code MCHK_PMF_ 
CONFIG is performed to notify software that the performance monitoring facility was 
incorrectly configured. If is strongly suggested that the physical address be at least 
octaword aligned, and preferably page aligned. 

The address of the pair of quadwords for an individual CPU is computed by shifting the CPUTD 
value left 4 bits and adding this value to the base address. This calculation is shown in equation 
form below (all numbers in these equations are hex). 



phys.base.addr = SCB |58] AND FFFFFFF0: 
phys.block.addr - { CPU ID LSHIFT 4 } -r phys.base.addr; 
The format of the pair of quadwords for each CPU is shown in Figure 16-2. 



Figure 16-2: Per-CPU Performance Monitoring Data Structure 



31 30 29 28127 26 25 24123 22 21 20119 16 17 16115 14 13 12 111 10 09 08 | 07 06 05 04 103 02 01 00 

t. i-— > 4- — i-— 4— 4- — r- - ■! r— -1— — 4— ~4--4- — h--4— +--+--+--*--+--+--+— +--+- — K— +—+--+--+— +__+__ 4— 

i PMCTR0, low longword I :+00 

PMCTR0, high longword I :+04 

4 4— 4 — - *— 4 *— — 4-- + 4— * 4--4--4--4--4--4--4--+--4--4--4----1---4--4--4--4--4--4 

63 62 61 60159 58 57 56155 54 53 52151 50 49 48147 46 45 44143 42 41 40 | 39 38 37 36135 34 33 32 
31 30 29 28127 26 25 24 123 22 21 20 | IS 18 17 16115 14 13 12 111 10 09 08 | 07 06 05 04 103 02 01 00 
I PMCTR1, low longword I 2+08 

; + -_ + __ + — 4.__+— + 4—-+- -+--+--+--+--+--+--+--+--+— +--+--+—■*— +— 4-- -4— «4-- -4~- 4— "+-■ -4 

PMCTR1, high longword I :412 

„_+__+__+__-^__+__ + __ + -_4.-_+__4__.r_-4._-+_™+-_+--4.-_4.^ 

63 62 61 60159 58 57 56155 54 53 52151 50 48 48147 46 45 44|43 42 41 40139 38 37 36135 34 33 32 



1 6.2.2 Memory Data Structure Updates 

When the performance monitoring facility is enabled, the memory data structure is updated from 
the hardware counters if the one of the counters is more than half full and the current processor 
IPL is below IB (hex), if a LDPCTX. instruction is executed and the PME bit in the new PCB is 
off, or if the performance monitoring facility is disabled via a write to the PME processor register. 
The PME bit is internally implemented as ECR<PMF_ENABLE>, with conversion handled by 
microcode. 
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When one of the counters reaches half full, an interrupt at IPL IB (hex) is requested. This inter- 
rupt request is serviced like any other interrupt if the IPL of the processor is below that of the 
interrupt request IPL. Like any other interrupt, it is serviced between instructions (or in the mid- 
dle of the interruptable string instructions). Unlike other interrupts, the performance monitoring 
interrupt is serviced entirely by microcode, with no software interrupt handler required. 

When a performance monitoring interrupt occurs, microcode temporarily disables the facility, 
reads and clears the hardware counters, then updates the memory data structure with the hard- 
ware counts. The facility is then re-enabled, the interrupt is dismissed, and the interrupted 
instruction stream is restarted. 

NOTE 

Although the performance monitoring facility is disabled during the memory update 
process, it is re-enabled for the restart of the interrupted instruction stream. Therefore, 
depending on what events were selected, the facility may count events that are part of 
the restart process. 

At the maximum rate (one increment every 14ns CPU cycle), an interrupt is requested every 459 
microseconds. 

If a LDPCTX is executed and the PME bit in the new PCB is off, or if the performance monitoring 
facility is disabled via a write to the PME processor register, the microcode disables the perfor- 
mance monitoring facility, reads and clears the hardware counters, and updates the memory data 
structure for the CPU with the hardware counts. 

NOTE 

The hardware counters are not cleared, and the memory data structures are not 
updated when the performance monitoring facility is disabled via a direct write to 
ECR<PMFJENABLE>. 

16.2.3 Configuring the Performance Monitoring Facility 

Before the performance monitoring facility is enabled, software must select the source of the event 
to be counted. This is accomplished first by selecting the box that reports the event, and then by 
selecting the event that is to be counted. The box section is made by writing to the PMF„PMUX 
field in the ECR processor register, as indicated by Table 16—1. 

Table 16-1: Performance Monitoring Facility Box Selection 

ECR<PMF_PMUX> 



(binary) Source of Information 

00 Ibox 

01 Ebox 

10 Mbox 

11 Cbox 



The event selection within the box is made by writing to a processor register within the box, as 
described in subsequent sections, and in the box chapters elsewhere in this specification. 
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The hardware used to implement the 16-bit counters is constructed such that the PMCTRl 
counter increments only if both its selected event, and the PMCTRO selected event are true 
simultaneously. As such, PMCTR1 is a strict subset of PMCTRO. As a result, some combinations 
of event selections will not cause PMCTR1 to be incremented. In some boxes, the event selection 
is specified in such a way that compatible events are automatically selected. In other boxes, the 
user must specify compatible events. Where they are required, compatible events are described 
in the sections below. 

16.2.3.1 Ibox Event Selection 

The Ibox reports only one event, so if the Ibox is selected, that event is also selected. The Ibox 
inputs to the PMCTRO and PMCTR1 hardware counters are shown in Table 16-2 

Table 1 6-2: Ibox Event Selection 

PMCTRO Input PMCTRl Input Description; Use 

VIC Access VIC Hit VIC hits compared to total VIC accesses; VIC hit ratio. 



1 6.2.3.2 Ebox Event Selection 

The Ebox reports several events, as selected by the PMFJEMUX field in the ECR processor 
register. The Ebox inputs to the PMCTRO and PMCTRl counters are shown in Table 16-3. 

Table 16-3: Ebox Event Selection 

ECR<PMF_ 
EMUX> 



(binary) 


PMCTRO Input 


PMCTRl Input 


Description; Use 


000 


Cycles 


S3 Stall 


S3 stalls (Bource queue, MD, Wn, Fbox scoreboard 
hit, Fbox input) compared to total cycles; S3 stalls 
per unit time. 


001 


Cycles 


EM+PA queue Stall 


EM latch and PA queue stalls compared to total cy- 
cles; EM+PA queue stalls per unit time. 


010 


Cycles 


Instruction Retire 


Ebox and Fbox instructions retired compared to total 
cycles; CPI. 


on 


Cycles 


Total stall 


Total Ebox stalls compared to total cycles; Stalls per 
unit time. 


100 


Total stall 


S3 Stall 


S3 stalls compared to total stalls; S3 stalls as a per- 
centage of all stalls. 


101 


Total stall 


EM+PA queue Stall 


EM latch and PA queue stalls compared to total 
stalls; EM and PA queue stalls as a percentage of 



all stalls. 
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Table 16-3 (Cont.): Ebox Event Selection 

ECR<PMF_ 
EMUX> 

(binary) PMCTRO Input PMCTEl Input Description; Use 

111 S5 Micro word event S5 Microword event Number of times a microinstruction whoBe MISC field 

contained INCR.PERF.COUNT reached S5. By us- 
ing the patchable control store, one may count mi- 
crocode events by setting the MISC field of selected 
microwords to this value. If this event is selected, 
writing to the PMPCNT processor register will incre- 
ment the counters via the MISC field decode. 



16.2.3.3 Mbox Event Selection 

The Mbox reports several events, as selected by the PMM field in the PCCTL processor register. 
The Mbox inputs to the PMCTRO and PMCTR1 counters are shown in Table 16—4. 



Ta ble 16-^4: Mbox Event Selection 
PCCTL<PMM> 

(binary) PMCTRO Input PMCTRl Input Description; Use 



000 



001 



010 



011 



100 



101 



111 



P0/P1 1- stream TB 
access 

P0/P1 D-stream TB 
access 

SO I-stream TB 
access 

SO D-stream TB 
access 

I-stream P cache 
access 

D-stream Pcache 
access 

Unaligned reads 
and writes 



P0/P1 I-stream TB 
hit 

P0/P1 D-stream TB 
hit 

SO I-stream TB 
hit 

SO D-stream TB 
lilt 

I-stream Pcache 
hit 

D-stream Pcache 
hit 

Total reads and 
writes 



TB hits for P0 and PI I-stream references compared 
to total TB accesses for P0 and PI I-stream refer- 
ences; P0/P1 I-stream TB hit ratio. 

TB hits for P0 and PI D-stream references compared 
to total TB accesses for PO and Pi I-stream refer- 
ences; P0/P1 D-stream TB hit ratio. 

TB hits for SO I-stream references compared to total 
TB accesses for SO I-stream references; SO I-stream 
TB hit ratio. 

TB hits for SO D-stream references compared to total 
TB accesses for SO D-stream references; SO D-stream 
TB hit ratio. 

Pcache hits for I-stream references compared to total 
Pcache accesses I-stream references; I-stream Pcache 
hit ratio. 

Pcache hits for D-stream references compared to to- 
tal Pcache accesses D-stream references; D-stream 
Pcache hit ratio. 

Unaligned virtual reads and writes compared to total 
virtual reads and writes; Unaligned references as a 
percentage of all references. 
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16.2.3.4 Cbox Event Selection 

The Cbox reports several events, as selected by the PM_ACCESS_TYPE and PM_fflT_TYPE 
fields in the DIAG_CTL processor register. The Cbox inputs to the PMCTRO counter are shown 
in Table 16-5 and the Cbox inputs to the PMCTR1 counter are shown in Table 16-6. 

Table 16-5: Cbox PMCTRO Event Selection 

DIAG CTL<PM 
ACCESS_TYPE> 



(binary) 


PMCTRO Input 


000 


Bcache access. PMCTRO increments when the Beache processes any reference from 




the CPU. 


001 


Bcache IREAD access. PMCTRO increments when the Bcache processes an instruction- 




sxream read request. 


010 


Bcache DREAD access. PMCTRO increments when the Bcache processes a data-stream 




read. 


011 


Full LW Write access. PMCTRO increments when the Bcache processes a LW write 




request. 


100 


Byte/Word Write access. PMCTRO increments when the Bcache processes a byte or 




word write, or write unlock. 


101 


Any Write access. PMCTRO increments when the Bcache processes any write, or write 




unlock. 


110 


P cache Invalidate. PMCTRO increments when a plnvReq is received. 


110 


Stall cycles. PMCTRO increments when hold_req or not tagOk is asserted at SYS„CLK 




leading edge. 


Tabie 16-6: Cbox PMCTR1 Event Selection 


DIAG_CTL<PM_ 




HIT_TYPE> (bi- 




nary) 


PMCTRl Input 


000 


Bcache hit. PMCTRl increments when a Bcache access results in any hit 


001 


Bcache hit dirty. PMCTRl increments when a Bcache access results in a dirty hit. 



010 Bcache hit clean. PMCTRl increments when a Bcache access results in a hit and the 
block is not dirty. 

011 Bcache miss dirty. PMCTRl increments when a Bcache access results in a miss in 
which both the valid and dirty bits were set. 

100 Bcache hit shared. PMCTRl increments when a Bcache access results in a hit in which 
both the valid and shared bits were set. 

101 Stall Requests. PMCTRl increments at SYS.CLK leading edge if a new hold_req or 
not tagOk is asserted. 
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16.2.4 Enabling and Disabling the Performance Monitoring Facility 

The performance monitoring facility is enabled or disabled by setting or clearing the Performance 
Monitor Enable (PME) bit in the CPU. This bit may be written in one of three ways: with a write 
to the PME processor register, by loading a new value with a LDPCTX instruction from the PME 
bit in the new PCB, or by a direct write of the ECR<PMF_ENABLE> bit. 

The format of the PME processor register is shown in Figure 16-3. 



Figure 16-3: PME Processor Register 



31 30 29 28127 26 25 24123 22 21 20119 18 1? 16115 14 13 12111 10 OS 08107 06 05 04 103 02 01 00 
I SBZ I I :PME 



ENABLE —4 



If PME<0> is written with a 1, the performance monitoring facility is enabled. If PME<0> is 
written with a 0, the performance monitoring facility is disabled. Direct writes to ECR<PMF_ 
ENABLE> are similar to writes to PME<0>, with the exception that the hardware counters are 
not automatically cleared, and the memory counters are not updated on an explicit write to 
ECR<PMFJENABLE>. 

The CPU PME bit is also loaded by the LDPCTK instruction from PCB+92<31>. 

CAUTION 

The longword at offset 58 (hex) from the SCB and the correct unique CPUID value for 
each CPU must be initialized before the performance monitoring facility is enabled. 
Failure to do so will result in UNDEFINED behavior of the system. 

The CPU PME bit is cleared, and the performance monitoring facility is disabled, at powerup. 



1 6.2.5 Reading and Clearing the Performance Monitoring Facility Counts 

In normal operation, microcode automatically updates the memory counters by reading the cur- 
rent value of the hardware counters, adding these values to the memory counters, and clearing 
the hardware counters. This is the preferred mode of operation. 

However, there may be some situations in which software wishes to directly read or clear the 
hardware counters. The current value of the hardware counters may be read from the PMFCNT 
processor register, whose format is shown in Figure 16-4. 



Figure 16-4: PMFCNT Processor Register 



31 30 29 28127 26 25 24123 22 21 20|1S 18 17 16115 14 13 12 111 10 09 08 1 07 06 05 04 1 03 02 01 00 

+._*._4._.*-.+..+.. 4~+ 4 

I Current Hardware PMCTR1 Value I Current Hardware PMCTR0 Value I : PMFCNT 



The current value of the 16-bit hardware PMCTR1 counter is returned in PMFCNT<31:16> and 
the current value of the 16-bit hardware PMCTR0 counter is returned in PMFCNT<15:0>. 
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The two 16-bit hardware counters may be explicitly cleared by software by writing a 1 to 
ECR<PMF_CLEAft>. If the counters are explicitly cleared, any outstanding interrupt request 
is also cleared. It is strongly suggested that the hardware counters not be cleared while the 
performance monitoring facility is enabled. 

If the performance model is configured to select the Ebox microword event (E CR<PMF_PMUX> =Ibox, 
ECR<PMF_EMUX>=S5 microword event, ECR<PMF_ENABLE>=1), a write of any value to the 
PMFCNT processor register will increment both hardware counters. 

NOTE 

If the 16-bit hardware counters are explicitly cleared by writing a 1 to ECR<PMF„ 
CLEAR>, any count in these registers is lost and will not be included in the memory 
counters. 

TEST NOTE 

The performance monitoring facility hardware in cr em enters may be tested by clearing 
them via ECR<PMF_CLEAR>, selecting the Ebox S5 microword event, and enabling 
the facility. Each write to the PMFCNT processor register will then increment both 
hardware counters, and the result may be observed by reading the PMFCNT register. 
The interrupt request may be tested by incrementing the PMCTRO hardware counter 
into bit<15>, which will cause an interrupt to be requested. 

16.3 Hardware and Microcode Implementation of the Performance Monitoring 
Facility 

The performance monitoring facility is implemented via both CPU chip hardware and microcode. 
A block diagram of the performance monitoring hardware is shown in Figure 16—5. 

The lower 16 bits of the PMCTRO and PMCTR1 performance counters are implemented as two 
16-bit incrementers in the Ebox. Both incrementers have a common clear line which is driven 
from MISC/CLR.PERF. COUNT, and each has an increment input. The 32-bit concatenated value 
from the incrementers can be read onto E%ABUS, and the upper bit of PMCTRO is used to generate 
E_PMN%PMON, the performance monitoring facility interrupt request. 

The PMCTRO and PMCTR1 increment inputs are supplied by PMUXO and PMUX1, through two 
AND gates. The PMCTRO increment is gated by the master performance monitoring facility 
enable. If the facility is not enabled, PMCTRO does not increment. The PMCTR1 increment is 
gated by the PMCTRO increment, and is therefore a strict subset of PMCTRO. 

The top-level selection of events is determined by ECR<PMF_PMUX>, which selects the source to 
PMUXO and PMUX1. This selects the source (Ibox, Ebox, Mbox, Cbox) of the increment events. 
Distributed in the appropriate boxes are second-level muxes which are selected to provide the 
actual source of the increment events for PMCTRO and PMCTR1. 
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16.3.1 Hardware Implementation 

The two 16-bit hardare counters are implemented as side-by-side incrementers in the Ebox data- 
path (this hardware also implements the Wbus LFSR reducer that is described in the testability 
section of Chapter 8). The increment signals for each of the counters are driven from two 4-to-l 
muxes that are selected by ECR<dPMFJPMUX>, and; which select the appropriate source of inputs 
to the incrementers. 
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Logic in the Ibox, Mbox. and Cbox select the appropriate values to drive the two increment signals 
based on processor register fields in the box. The Ebox increment signals are selected locally and 
provide the fourth input to the muxes. The PMCTR1 increment signal is forced to be a subset of 
the PMCTRO increment signal by ANDing the raw PMCTR1 increment signal with the PMCTRO 
increment signal to produce the final PMCTR1 increment signal. 

Because the PMCTR1 increment is a strict subset of the PMCTRO increment, the ultimate source 
of the two increment signals align them such that they are valid in the same cycle. For example, 
if the selcted conditions are IREAD PCACHE ACCESS and PCACHE HIT, these two signals are 
valid in the same cycle, and they refer to the same reference. Therefore the assertion of IREAD 
PCACHE ACCESS is delayed until the cycle in which PCACHE HIT is valid. In addition to 
this, the source of the increment signal guarantees that any events that may be retried are only 
recorded once. For example, a particular Pcache access causes only one increment, even if it is 
retried multiple times. 

When the 16-bit PMCTRO counter increments into the high-order bit, an interrupt is requested 
by asserting the E_PMN%PMON_L signal to the interrupt section. This signal is sampled by edge- 
sensitive logic, so the interrupt request is maintained until it is cleared by writing a 1 to the 
appropriate bit in the INT.SYS register, even if the performance monitoring facility hardware 
counters are subsequently cleared. 

When the 16-bit PMCTRO incrementer reaches its maximum value, subsequent increments of 
either incrementer are inhibited. In normal operation, this should not occur, but the counter may 
overflow if the interrupt request isn't serviced within several hundred microseconds, as would be 
the case if software spent an extended period of time a high IPL with the performance monitoring 
facility enabled. 

The 32-bit concatenated value of the two 16-bit hardware incrementers can be read onto E%ABUS 
when selected by A/PERF COUNT. This is the mechanisim by which microcode retrieves the 
current values of the two incrementers. 

1 6.3.2 Microcode Interaction with the Hardware 

There are several points at which the microcode interacts with the performance monitoring facility 
hardware. At powerup, microcode clears both of the 16-bit hardware incrementers and any 
potential interrupt request. 

MICROCODE RESTRICTION 

If the performance monitoring facility hardware incrementers are cleared in cycle J n' via 
MISC/CLR.PERF. COUNT, INT.SYS<28> must be written with a 1 no earlier than cycle 
'n+3' to guarantee that the interrupt request is cleared. This delay is due to latency 
introduced between the performance monitoring factility hardware and the interrupt 
section. 

Microcode reads the current value of the hardware incrementers via A/PERF. COUNT as a byprod- 
uct of a read of the PMFCNT processor register, and as part of the process of updating the memory 
counters. 

Microcode clears the hardware incrementers via MISC/CLR.PERF.COUNT when 
ECR<PMF_CLEAR> is written with a 1. Microcode also clears the incrementers after reading 
and updating the memory counters. 
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Microcode uses the CPUID processor register value to find the pair of quadwords that contain 
the performance counter values for this CPU. This value must be correctly initialized by either 
console firmware or software before the performance monitoring facility is enabled. The operation 
of the processor is UNDEFINED if CPUID is not correctly initialized. 

The memory counters are updated under three circumstances: when a performance monitoring 
facility interrupt is serviced, when the facility is disabled via a write to the PME processor register, 
and when the facility is disabled by loading a new value of PME is LDPCTX The memory updates 
are done in a common subroutine by disabling the facility by clearing ECR<PMF_ENABLE>, 
reading the current value of the hardware incrementers and then clearing them, and updating 
each quadword in memory with the appropriate 16-bit hardware value. 
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16.4 Revision History 



Table 16-7: 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


12-Jan-1990 


Initial release 


Mike Uhler 


02-Jul-1990 


Update to reflect implementation 


Gil Wolrich 


Ol-Feb-1991 


detail NVAX Plus Cbox inputs 
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Chapter 17 

Testability Micro- Architecture 



17.1 Chapter Overview 

This chapter describes the NVAX PLUS chip Testability Micro-Architecture. 

17.2 The Testability Strategy 

The NVAX PLUS chip testability strategy addresses the broad issue of providing cost-effective 
and thorough testing during many life cycle testing phases. The strategy specifically implements 
test features to support 

• chip debug 

• high fault coverage test at wafer probe and packaged chip test 

• support "reduced probe contact" wafer probe test 

• support for effective chip bum-in test 

The strategy uses a variety of testability techniques and approaches that are best suited to 
address the specific functional elements in the chip. The cost-effective implementation is realized 
by the appropriate consideration of global issues, by unifying the test objectives, by sharing test 
resources and by exploiting features inherent in the chip. The strategy also relies on leveraging off 
the design verification patterns in developing production test patterns to meet the fault coverage 
goals. 

The test features are implemented such that they have no effect on the targeted performance of 
the chip. 

17.3 Test Micro- Architecture Overview 

The NVAX Plus Test Micro-Architecture consists of two principal elements: Test Interface Unit 
and the Testability Features. 

Test Interface Unit 

The Test Interface Unit (TIU) implements a comprehensive test access strategy for NVAX Plus. 
It permits an efficient access to testability features implemented on the chip. 
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The Parallel Test port is used for accessing internal scan registers and test features which benefit 
from parallel access (for example, microaddress bus). 

For NVAX Plus, the parallel test port consists of the icMode_h[l] pin, data pins PP_DATA[7:0] and 
PP,.DATA[ll], 3 tagAdr pins (TAGADR_H[ 19,18,17]) which multiplex PP„DATA[lO:8] signals, and three 
input pins, ICMODE_H[0] and PP„CMD_H[1K)], which receive the parallel port command. 

The parallel port must be enabled in order for test data to be driven to the parallel port pins. 
The port may be enabled and operated in two configurations: STANDARD and OVERRIDE. 

In STANDARD configuration, ICMODE_H[i] must be deasserted and the default parallel port mode 
is OBSERVE MAB (observe the microcode address bus). The parallel port may be enabled by 
writing a 1 to DIAG_CTL[MA£ JEN]. When enabled in STANDARD configuration, MAB data will 
be output to dedicated parallel port pins PP_DATA[7K>] and PP_DATA[ii] as described in Table 17-2. 
The remaining bits of the MAB will be conditionally output to multiplexed pins TAGADR[19:17] 
based on system configuration as determined from BIU_CTL[B C_SIZE] . If BIU_CTL[BC__SXZEJ 
specifies that a tagAdr pin is NOT included in the tag comparison, then the pin will function as 
a parallel port data pin: 

• TAGADR_H[ 17] is included in the tag comparison only if BIILCTL[BC_SIZE] is '000 (Bcache 
size is 128 Kbytes) 

• TAGADR_H[18] is included in the tag comparison only if BIU_CTL[BC_SIZE] is '000 or '001 
(Bcache size is 128 Kbytes or 256 Kbytes) 

• TAGADR_H[19] is included in the comparison only if BIU_CTL[BC_SIZE] is '000 or '001 or '010 
(Bcache size less than 1 Mbyte). 

In OVERRIDE configuration, ICMODE_H[l] must be asserted and the ICMODE_H[0] and PP_CMD[1:0] 
pins determine the parallel port mode as shown in Table 17-2. Assertion of ICMODE_H[i] immedi- 
ately enables the parallel port, overriding the state of DIAG_CTL[MAB_EN] and BIU.CTL[BC_ 
SIZE]. ALL parallel port output pins (including tagAdr multiplexed pins) will drive parallel port 
data regardless of the state of DIAG_CTL[MAB_EN] or BIU_CTL[BC_SIZE]. 

DIAG_CTL[MAB_EN] is cleared with the reset signal, not by microcode, and causes parallel port 
output pins to be tristated in STANDARD configuration. This bit must be set by software to 
drive the parallel port data to the pins. OVERRIDE configuration ignores the state of this bit, of 
course. 

NVAX Plus supplies a feature for reducing the number of probes required for wafer probe. Since 
a tester may not supply enough probes for every pin on the chip, certain pins can be completely 
omitted from wafer probe (with a small associated reduction in test coverage). The pins which 
can be omitted were selected for their low amount of critical functionality, and are: 



Pin Names Direction Number 

check_h[27:0J B 28 

adr_h[12:5] T 8 

vref I 1 

NVAX Plus has 291 signal pins. This feature removes 39 pins from probe requirements, and 
allows a tester with only 254 signal pins to be used for wafer probe. Assertion of TESTJMODE_H 
pulls input-only and bidirectional signals internally to a logic 0 level, to insure valid logic levels 
are maintained during testing. TEST_MODE_H should not be asserted under any conditions where 
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designated input or bidirectional pins are driven from an external source. Note also that test 
software must handle the logic 0 levels which are driven on the check bits when in this mode (i.e. 
tests should run with ECC checking disabled). 

The Test Pads primarily facilitates micro-probing during chip debug. These pads are located at 
strategic nodes throughout the chip. 

NVAX Plus uses the port for the Serial Rom consisting of SROMD _H , SROM C LK_H , SROMOE_L, and 
ICMODE_H[0) which determines whether to input from the sROM at reset_l allowing the PCache 
to be loaded serially at reset for diagnostics. This feature also provides support for convenient 
self-test operation during the chip burn-in test. 

In addition to these test ports, NVAX Plus also uses the normal system port (pins) for test access. 
This access consists of using the VAX instructions to manipulate a testability feature or to perform 
the actual tests on the chip's logic. 

Table 17-1 summarizes the dedicated test pins for NVAX. 



Table 17-1: NVAX Plus Test Pins 


Pin Name 


Pin Type 


Pin Function 


ICMODE.HH] 


I 


Selects parallel port OVERRIDE configuration 


ICMODE.HIO] 


I 


NVAX PP_CMD_H[2], Read SROM at reset 


pp_cmd_h<1:0> 


I 


Parallel Port: NVAX pp_cmdjb<1:0> if enabled 


PP_DAX^H<11> 


T 


Parallel Port: NVAX pp_baix.h<11> if enabled 


PP_DA3A_H<7:0> 


T 


Parallel Port: NVAX pp_data_h<7:0> if enabled 


"IAGADR_H<19:17> 


B 


Parallel Port:NVAX pp_daia^h<10:8> if enabled 


TRtSTATE.L 


I 


Disables (tri-state) all output drivers 


CONT.L 


I 


Continuity for testing 


SROMD.E 


I 


Serial .Data In 


SROMCLE.H 


0 


CLK or serial data out 


SROMOE.L 


0 


SROM output enable 


TEST_MODE_H 


I 


Selects Reduced- Wafer-Probe Mode 



17.4 Parallel Test Port 

This port allows the critical chip nodes to be either controlled or monitored in parallel. ICMODE<l> 
enables the parallel port select pins ICMODE_H<0>&PP„CMD_H<1:0> as parallel port command 
inputs. Note ICMODE<0> is used as sRomFast at reset. If ICMODE<l> is asserted at reset then 
ICMODE<0> is used as PP_CMD and sRomFast simultaneously. The port consists of 16 test pins 
as follows: 

— ICMODE„H[ij: selects OVERRIDE configuration for parallel port. 

— PP_DAIA_H<11>: same function as NVAX PP_DATA W B<11> in OVERRRIDE, outputs internal 
phi_2 if in STANDARD configuration and BIU_CTL[MAB_EN] is set. 
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— PPJDAIAJB<5:0>: same function as NVAX PPJDATAJH<5:0> in OVERRIDE, outputs MAB<5:0> 
if in STANDARD configuration and BIU_CTL[MAB_EN] is set. 

— PP_DAIA_H<7:6>: same function as NVAX PP_DATA_H<7:6> in OVERRIDE, outputsMAB<7:6> 
if in STANDARD configuration and BIU_CTL[MAB_EN] is set. 

— TAGADRJ3 < 1 7 > : same function as NVAX PP_DAXA W H<8> in OVERRIDE, outputsMAB<8> if 
in STANDARD configuration and BIU_CTL[MAB_EN] is set and Bcache size is greater than 
128 Kbytes. 

— 1AGADR_H<18>: same function as NVAX PP_DAIA W H<9> in OVERRIDE, outputsMAB<9> if 
in STANDARD configuration and BIU_CTL[MAB_EN] is set and Bcache size is greater than 
256 Kbytes. 

— 1AGA»R_H<19>: same function as NVAX PP_DAXA_H<10> in OVERRIDE, outputsMAB<10> if 
in STANDARD configuration and BIU_CTL[MAB_ENJ is set and Bcache size is greater than 
512 Kbytes. 

— ICMODE_H<0>: same function as NVAX PP_CMD_H<2> in OVERRIDE. 

— PP_CMD_B<1:0>: same function as NVAX PP_CMD_H<1:0> in OVERRIDE. 

17.4.1 Parallel Port Operation 
internal Scan Register 

When shifting, the ISR bits are serial to parallel converted. They change every third cycle on 
internal phi_4. This gives usable time with respect to sysCLKoutlJb. The parallel port commands 
are captured synchronously with respect to sysCLKoutl.h, at the falling edge. In order to give 
full flexibility in capturing a given internal cycle, a mechanism is provided to delay the capture- 
and-start-shifting event by 0, 1, or 2 cycles. This delay is determined by the state of the parallel 
port bits (pp_cmd_h<l*0>) immediately before entering the Shift ISR mode. ('00' corresponds to 
zero delay, '01' corresponds to 1 cycle delay and '10' corresponds to two cycle delays.) 

See the timing diagrams in Figure 17—2 

Note that the initial packets of ISR data contain data from before the load event from the last 
bit on the chain. After one or two samples, this data is all valid sampled data. 

MAB Access 

For full speed MAB observation, an internal clock is provided which will allow synchonous capture 
by a DAS in any debug environment. Figure 17—1 shows the the self-relative timing during 
Observe MAB mode. 

The following modes of the parallel port can be selected from ICMODE_E <0 >/DWSEL_H < 1 :0 > in 
test mode. 
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Figure 17-2: Internal Scan Register Operation Timing 
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Table 17-2: Parallel Port Operating Modes 

Command Pins. Data Pins 

pp_daia^h<11>/tagade_ 
icmode<0>/pp_cmd_ b<19:17>/ppjoaia_h<7:0> 

H<1:0> mm mm 



pp.cmd.h<2:0> 


Port Mode 


PPJC)AEA W H<11:0> 


Signals controlled/Observed 


111 

XXX 


OV>r<»tv#» MAR fTVfnulH 




Tn+^mal tvvtt *> 

XII 1st? 1 Ilea. 1 r*lXl_* 






PPJDAIAja<10:0> 


E-Box MAB 


110 


Observe M-BOX 


pp.dat\.h<11:9> 


S5 Reference Source 






pp_data^h<8:4> 


S5 command 






PP_DAIA_H<3> 


M%MME_FAULT_H 






PPJ>AEAJB<2> 


S5 Abort 






PP_DAIA W H<1> 


S5 TB Miss 






PP.DAIA^B<0> 


S5 PCache Hit 


10 1 


Observer C-Box/M-Box 


PP_DAIA W H<11:7> 


C-box arb_state<4:0> 






pp_daxa.h<6:4> 


M-box MD Destination 






pp_dai\_e<3:0> 


M-box MME State 


10 0 


Observe I-Box 


PPJ5A3A_H<11> 


Internal pm_2 






ppjoa3A_h<10:7> 


Undefined 






pp_j>aox.h<6:0> 


I-MAB 


0 11 


Enable LFSR Mode 


PP_DAIA_H<11:0> 


Undefined 


0 10 


Undefined 


pp_dax^.h<11:0> 


Undefined 


0 0 1 


Shift ISRs 


PP J>AIA».H< 11 :3 > 


ISR1 (Control Store data) 






PP_DA3A_H<2:0> 


ISR2 (Other internal scan data) 


000 


Force MAB 


PP.DAT\.H<11:0> 


Undefined 



1 7.5 Test Pads 

This port consists of strategic internal nodes brought out to top level of metal in the form of 
3x3 micron test pads. These pads will be accessed by probes during chip debug and wafer probe 
manufacturing tests. The access may primarily provide observability of these nodes, however, con- 
trollability may also be provided where appropriate. See the testability sections in box chapters 
for the list of nodes brought out on the top metal layer. 

17.6 System Port 

This is simply the normal system I/O of the chip. It is identified as a test access port for two 
reasons: 

• It is used to provide the read/write access to testability features via the VAX architecture's 
MFPR and MTPR instructions. 

• It provides the natural resource for testing the chip via the macro-code based tests. 
See the. individual box chapters for the list of specific architectural features provided. 
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It is difficult to achieve high test coverage in the the burn-in and life-test environments due to 
limited test pattern bandwidth and the difficulty in synchronizing test equipment to the NVAX 
Plus chip. Using this serial port, burn-in and life-test programs can load the real "test program" 
into Pcache, where the chip can perform a self-test. 

This scheme minimizes test pattern bandwidth, allows for asynchronous transmission of the serial 
data, provides a means to stimulate multiple chips under test which are running asynchronously, 
and supplies a means to achieve high test coverage^ 

17.7 tristatej 

NVAX Plus chip has a dedicated pin TRXSTATE„L. When asserted low, the CPU chip tri -states 
output drivers on all output-only and bidirectional pins, except the following: 

• CPUCLKOUT.H 

• SYSCLKOUTl„H 

• SYSCLKOUTl_L 

• SYSCLKOUT2_H 

• SYSCLROUT2_L 

The single pin tristate functionality is used only during testing. 

17.8 contj 

NVAX Plus chip has a dedicated pin CONTJL When asserted low, NVAX Plus connects all of its 
pins to VSS, with the exception of these pins: 

• CLKIN.H 

• CLKINL 

• CONT_L 

• CPUCLKOUT_H 

• DCOK.H 

• RESET.L 

• SYSCLKOUTl_H 

• SYSCLKOUT1JL 

• SYSCLKOUT2_H 

• SYSCLKOUT2_L 

• TESTCLKINJH 

• TESTCLKIN.L 

• TRISTATE_L 

CONT_L should only be used at test in conjunction with TRISTATE„L, 
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17.9 Revision History 



Table 17-3: 


Revision History 




Who 


When 


Description of change 


Gil Wolrich 


15-Nov_1990 


Release for external review. 


Gil Wolrich 


Ol-Aug-1991 


update 


Tim Fischer 


29-Aug-1991 


Pass 1 Implementation Update 
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Chapter 18 

AC/DC Characteristics 

This chapter contains the AC and DC specifications for NVAX Plus. Timing parameters are given 
for the nominal speed binned (14ns) parts. Variations for binned parts are tbd. 



18.1 input Clocks 

The input clocks clkln_h,_l and testclkln_h,_l are received differentially, then XORed to provide 
the time-base for NVAX Plus when dcOk_h is asserted. We expect testclkln_h,_l to be used only 
by testers unable to drive clkln„h,_l at full speed. The terminations on these signals are designed 
to be compatible with system oscillators of arbitrary DC bias. Schematically, they look as follows: 



... — + 

PIN | -+- ; PAD'!-— +— — — — > (to diff-amp) 



I 50ohms HL_Z 
Cpkg +----RRRR +- RHRR — — + 



— 40pF 

Vbias - (Vdd-Vss) 12 I 



This is designed to approximate a 50ohm termination for the purpose of impedance matching 
for those systems (if any) which drive input clocks across long traces. Furthermore, the high 
impedance bias driver allows a clock source of arbitrary DC bias to be AC coupled to NVAX Plus. 
The peak-to-peak amplitude of the clock source must be between 0.6V and 3.0V as seen by NVAX 
Plus. Either a "square-wave" or a sinusoidal source may be used. Note that full-rail clocks may 
be driven by testers. 

The following table lists the input clock cycle times for the various NVAX Plus bin speeds. Note 
that the these periods equal ones-quarter the corresponding cpu cycle times. 
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Table 18-1 : Input Clock Timing 



Name 


Fast Bin 


Nominal Bin 


Slow Bin 


Unit 


clkln period min 


2.5 


3.5 


3.5 


nS 


clkln period max 


tbd 


tbd 


tbd 


nS 


clkln symmetry 


50%+A10% 


50%+/-10% 


50%+/-10% 


percent 



1 8.2 cpuClkOut_h 



The cpuClkOut__h signal is expected to be used only by an ECL synchronizer in systems using 
the tagOk protocol. In order to accommodate ECL levels, the driver consists of only a PMOS 
pullup device. ECL 100K levels may be constructed with a 50ohm board resistor in series with 
the driver and a lOOohm board resistor between the load and (Vdd - 2V). CMOS Vdd must equal 
ECL Vcc in- this scheme. Note that the trace must be short to insure good signal integrity if, as 
expected, the board impedance is not in the vicinity of lOOohm. 



1 8.3 Test Configuration 

All outputs and bidirectional signals including clocks but excluding cpuClkOut_h are specified 
with respect to a standard 40pF load as shown below. All timing is specified with respect to the 
crossing of standard TTL input levels at 0.8V and 2.0V. 



NVAX Plus ! + 

PIN ! I 

i — - 40pf 



GND 



1 8.4 Fast Cycles on External Cache 

From a system standpoint, fast cycles on the external cache are completely unclocked. The two 
cases of read and write cycles require separate treatment. 
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18.4.1 Fast Read Cycles 

External logic must meet the maximum flow-through delay, as denned with respect to the circuits 
below. 



NVAX Plus 
PIN 



Address 
Control | 



GND 



I Address 4 

NVAX Plus I — — 

PIN I Control 
40pf | | External 

Logic 



Data 



NVAX Plus 
PIN 



"Address" refers to adr_h and dataAJi. "Control" refers to dataCEOEJi and tagGEOEJi. "Data" 
refers to data_h, check_h, tagAdr_h, and tagCtl_h. Assume that address/control is driven from 
the same NVAX Plus internal clock edge in the two cases above. External flow- through delay 
is defined as the delay between address/control valid to the 40pF standard load in the left-hand 
case and data valid to NVAX Plus in the right-hand case. 

The external flow_ through delay may not exceed CACHE_SPEED (i.e. 2,3, or 4 cpu_cycles as 
set in the BIU.CTL register) plus 1 additional clock phase. Thus if CACHE_SPEED is set to 2 
cpu cycles the flow through delay must not exceed 9 times the clkln period, if CACHE_ SPEED 
is set to 3 cpu cycles the flow through delay must not exceed 13 times the clkln period, and if 
CACHE_SPEED is set to 4 cpu cycles the flow through delay must not exceed 17 times the clkln 
period. One phase (a single clkln period is reserved to allow NVAX Plus setup time for latching 
the Data. The Tag Compare function is deferred to the next internal cycle and does not subtract 
form the time available to the flow through path. NVAX Plus guarantees that its address drivers 
are enabled at least one cpu cycle prior to a fast cache access, such that adr_h need never be 
pulled down from 5V during the cycle. 

NOTE 

NOTE:The NVAX Plus Address Driver is designed for point to point, or daisy chain 
loading with NVAX Plus driving from one endpoint of the etch. 



1 8.4.2 Fast Write Cycles 

External logic must guarantee that fast writes complete for the following NVAX Plus timing. The 
write pulse width is 4 times the clkln period if CACHE SPEED is set to 2 cpu cycles, and 8 
times the clkln period if CACHE SPEED is set to 3 cpu cycles, and 12 times the clkln period if 
CACHE SPEED is set to 4 cpu cycles. The data is driven 1 clkln period before the dataWE_h 
and tagCtlWE_h assert and is held for 3 clkln periods after dataWEJh and tagCtlWE_h deassert 
for all selections of CACHE SPEED. The address becomes valid during the write probe cycle, and 
holds for 5 clkln periods after the dataWEJi and tagCtlWEJh deassert. 
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DRV CYCLE I PROBE | COMPARE | WRITE I 

DRV CLK I : I |" I | | 

CPU CLK I | I | | | I "| |" | | | | 

PHASES 11 12 13 |4 II |2 13 |4 II 12 13 |4 II 12 13 |4 11 12 13 14 II 12 13 14 II |2 13 |4 

ADDRESS XXXXXXXXX XXXXX 

DATA XXX XXX 

WRITE_EK / \ 

The timing of pMapWE_h[1..0] during dcache read hits has the same pulse width, and address 
setup and hold as dataWEJi and tagCtlWEJi. 

1 8.4.3 CEOE timing 

The rising edge of sysClkOutlJh is always with internal clock phase 1. The chip enable/output 
enable signals tagCEOE and dataCEOE have internal phase 2 timing. As a result these signal 
may deassert 1 clkln period after Hold_ack is asserted and 1 clkln period after the CREQ lines 
assert. 



18.5 External Cycles 

All external cycle timing is referenced to the rising edge of sysClkOutl_h. Input setup and hold 
times and output delay and enable times are referenced to the point at which sysClkOutlJh 
crosses 0.8V. (Output enable time is defined as output delay time from a tri-stated state. It 
may differ from the nominal delay because it may entail pulling down from a 5V level,) Output 
hold times are referenced to the point at which sysClkOutlJi crosses 2.0V. They denote the times 
beyond sysClkOutl_h for which outputs hold their valid values from the previous cycle. Note that 
these times are negative, meaning that data may lose validity BEFORE sysClkOutl_h becomes 
valid high. (This is possible because there is no cause-effect relationship between the system 
clock outputs and data. In fact, the system clock outputs are nothing more than data pins which 
happen to switch in a fixed pattern.) Address enable timing is relevant only for systems using the 
holdReq protocol with two cpu cycles per system cycle. All bidirectional lines may be considered 
enable or disabled simultaneously with the rising edge of sysClkOutl_h. 
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Table 18-2: External Cycles 



Name 


Min 


Max 


Units 


Enable, sysClkOutlJb to 


adr_h., data_h, check_h 




2.9 


nS 


Output Delay, sysCIkOutl_h to 


adr_h. data_h, check_h, cReq_h. c'^Mask_h., holdAck_ 
h 




1.5 


nS 


Output bold, sysClkOutlJi to 


adr_h.. data_h, check_h, cReq_h, cWMask_h, holdAck_ 
h 


-1.5: 




nS 


Input Setup relative to sysClk:Outl_h 


cAck_k. dRAck.h, dWSelJi, dOE.l 


9.3 




nS 


holdReq_h 


4.8 




nS 


dInvReq_h, iAdr„h 


4.5 




nS 


data_h, check_h 


3.5 




nS 


Input Hold relative to sysClkOutl_b 


cAckJi, dRAck.h, dWSelJi, dOEJ 


0 




nS 


data_h, check„h 


0 




nS 


holdReq_h, dInvReq_h, iAdr_h 


0 




nS 



18.6 tagEq 

Wlien active during external cache hold, the timing of tagEq_l is specified from when its inputs 
become valid at the NVAX Plus pins. 



Table 1S-3: tagEq 



Name 


Min 


Max 


Units 


Delay, adr_h -> tagEq_l 




17.0 


nS 


Delay, tagAdr_h -> tagEq_l 




17.0 


nS 
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18.7 tagOk 

The tagOk_h,_l signals are expected to be driven to NVAX Plus directly from the final stage of 
an ECL synchronizer clocked by cpuClkOut_h. As in the case of fast external cache cycles, the 
system must meet a maximum now-through delay. This delay is denned with respect to the 
circuits below. 



I cpuClkOut_fc 

NVAX Plus I RRRP, 

PIN I 50ohms I 



Vdd-C . 0V | 

0 RRRP. + 

lOOohms 



NVAX Plus I- 
lOpF PIN I 



cpuClkOut_h +- 



NVAX Flus 
PIN 



tagOfc_h,_l 



External 
Logic 



Assume that cpuClkOut_h is driven from the same NVAX Plus internal clock edge in the two cases 
above. External flow-through delay is denned as the delay between cpuClkOut_h valid to the lOpF 
ECL "standard" load in the left-hand case and tagOK_h,_l valid to NVAX Plus in the right-hand 
case. It may not exceed the nominal cpu cycle time less 3.9ns. Note that board resistors must be 
part of "external logic" in the circuit on the right. For purposes of this specification, cpuClkOut„h 
is considered valid when it crosses the ECL threshold "Vbb" (equal to roughly Vcc - 1.3V) and 
tagOk is considered valid when the differential lines cross each other. 



18.8 Tester Considerations 



18.8.1 Asynchronous Inputs 

The signals reset_l, irqjh, and sRomD_h (in serial port mode) are asynchronous during normal 
system operation. However, for test purposes they should be driven synchronously with sysClk- 
Outl_h with the timing given below. Note once again that these parameters are given with 
respect to the time at which the rising edge of sysClkOutl_h crosses 0.8V. 
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Table 18-4: Asynchronous Signals on a Tester 



Name Min Max Units 



Setup, resetj -> sysClkOutl_h 


5.0 


nS 


Setup, irq_h -> sysClkOutl_h 


5.0 


nS 


Hold, irq_h -> sysClkOutlJi 


0 


nS 


Setup, sRomD_h -> sysClkOutl_h 


5.0 


nS 


Hold, sRomD_h -> sysClkOutl.h 


0 


nS 



18.8.2 Signals Timed from Cpu Clock 

Due to the speed of NVAX Plus, it is expected that at-speed testing will be done with tester cycle 
equal to system cycle (i.e. sysClkOutl_h). However, fast external cache operation and serial ROM 
operation are timed as a function of the CACHE_SPEED field of the BIU_CTL register. Therefore, 
input sampling and output enabUng and switching may occur at different time points within a 
tester cycle from one cycle to the next. If sysClkOut and BIU_CTL<CACHE_SPEED> are selected 
as the same multiple of cpu cycle the timing is completely deterministic. For sysClkOut <- 2, 
and CACHE.SPEED <- 2 all cache cycle start with respect to the falling edge of sysClkOutlJi. 
For sysClkOut <- 3 and CACHE.SPEED <- 2 (as in COBRA) the timing of cache related signals 
relative to sysClkOut can slip to any one of three positions within the sysClkOut cycle. 

The serial ROM outputs sRomOE_l and sRomClk_h may be strobed with the same timing as the 
data_h pins when driven by NVAX Plus. The serial ROM input sRomD_h may be switched with 
the same timing used in serial port mode. 

18.9 DC Characteristics 

NVAX Plus are capable of running in a CMOS/TTL environment. 

1 8.9.1 Power Supply 

In CMOS mode the VSS pins are connected to 0.0V, and the VDD pins are connected to 3.3V, +/- 

5%. 

To prevent damage to NVAX Plus, it is important that the 3.3V power supply be stable before any 
of NVAX Plus's input or bidirectional pins be allowed to rise above 4.0V. System designers should 
note that this is exactly opposite to the rule used by 5.0V inputs in CMOS-3, so care should be 
taken when "borrowing" power supplies from CMOS-3 systems. 

lb help in meeting this requirement, the assertion levels of NVAX Plus's input pins have been 
arranged so that their default state is the electrical low state. This makes them active high, with 
the exception of tagOk_l and dOE_l, which are true by default. 
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18.9.2 Input Clocks 

clkln is expected to be differential signals generated from an ECL oscillator circuit. It should be 
AC coupled with a nominal DC bias of VDD/2 set by a resistive network. Details are tbd. 
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18.9.3 Signal pins 

Input pins are ordinary CMOS inputs with standard TTL levels, see Table 18-5. Once power has 
been applied, the majority of input pins can be driven by 5.0V signals without harming NVAX 
Plus. There are some signals that are sampled before vRef is stable, and these signals can not 
be driven above the power supply. These signals are: 

• dcOkJh 

• tristate_l 

• cont_l 

• eclOutJi 

Output pins are ordinary 3.3V CMOS outputs. Although output signals are rail-to-rail, timing is 
specified to standard TTL levels, see Table 18-5. 

Bidirectional pins are ordinary 3.3V CMOS bidirectional. On input, they act like input pins. On 
output, they drive like output pins. 

Once power has been applied, bidirectional pins can be driven to 5.0V without harming NVAX 
Plus (it is not necessary to use static RAMS with 3.3V outputs). 

Table 18-5: CMOS DC Characteristics 



Parameter Requirements 



Symbol Description 


Mizx 


Max 


Units 


Test Conditions 


TTL Inputs/Outputs 


Vih High level input voltage 


2.0 




V 




VII Low level input voltage" 




0.8 


V 




Voh High level output voltage 


2.4 




V 


Ioh = -lOOuA 


Vol Low level output voltage 




0.4 


V 


Iol = 3.2mA 


Power/Leakage 


Icin Clock input Leakage 


-50 


50 


uA 


-0.5<Vin<5.5V 


Iil Input leakage current 


10 


10 


uA 


0<Vin<Vdd V 


Iol • Output leakage current (three- 


-10 


-10 


uA 




state) 










Idd Active supply current 




?4.5? 


A 


NVAX Plus @ 14.0ns cycle 






?6.0? 


A 


NVAX Plus @ 10.0ns cycle 






?4.5? 


A 


NVAX Plus @ 14.0ns cycle 










15=0 C, Vdd=3.GV 



DIGITAL CONFIDENTIAL 



AC/DC Characteristics 18-9 



NVAX Plus CPU Chip Functional Specification, Revision 0.3, October 1991 



18.10 Timing Overview 

NVAX Plus cpu cycles consist of four phase(pMl,phi2,phi3,phi4). In system operation the period 
of each phase is equal to the clkln_h,_l period. In the tester environment the input clock is derived 
from an TOR' of clkln_h,ji and testClkIn_h>_l- This produce a 2X input frequency of that which 
can be driven to the clock inputs from tester input signals. The system clock sysClkOutl_h,_l 
can be programmed to be 2,3, or 4 times the cpu cycle period. The LASER and PVN systems 
both program sysClkOutl_h_.l for 2X the cpu cycle. Most testing of NVAX Plus will be done with 
sysClkOutl.h,..! set for 2X the cpu cycle. 



20Ctahz clkln_h | | | | | I I 

200mhz testclkln | | | | | | | l_ 

400ihhz elkln h |~| |~| l~i ~l l~| fl f I |~| |' 



phi_2 i i | | I I l I 

phi_2 _] i | I | I |' | 

phi_3 |" I I I I I I "| 

phi_4 !~~| I | | I f " 

cpuClkOut_h I I ! I | |_ | ~| |* | I | | | J l 

sysClkOutl_h I I |' I 

SF-sys_first SL-ays_last SF SL 

DL-driv«_last 

The CPU_CLK runs at a cycle time as fast as 10ns, and SYS_CLK can be set to 2, 3, or 4, times the 
CPU cycle time. 



18.11 Signals 

The following table lists all of the 291 signals on the NVAX.PLUS chip. In the "type" column, an 
"I" means a pin is an input, an "O" means the pin is an output, a "T" means the pin is a tristate 
output, and a "B" means the pin is tristate and bidirectional. In the "timing" column "SF" means 
sysClkOutl first cpu cycle,"SL" means sysClkOutl last cpu cycle, "DL" means drive_clock last 
cpu cycle, which is sys.first when sysclock and cache speed are bot 2X the cpu cycle. For inputs 
the phase column indicates the phase at which the input signals change. For outputs, the phase 
column indicates the reference from which timing is specified in the function column. 

Tabie 18-6: NVAX_PLUS Signals 

Signal Name Count Type Phase Function 

cBdn_h w l 2 I 1,2 Clock input 

testClkIn_h,J 2 I 2,3 Clock input for testing 
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Table 18-6 (Com.): 


NVAX_PLUS Signals 






Signal Name 


Count 


Type 


Phase 


Function 


ciR_rst_ n 


1 


1 


l 


... 

Put cpu and sys_clk tuning gen. to known state, 
clkln & testClkln stopped 


cpuClkOutJi 


1 


0 


1,3 


CPU clock output, phase 1 & 3 every cpu cycle 


sysClkOutl_h,_l 


2 


o 


1 


System clock output 




2 


0 


i.0ro 


System clock output, delayed 


adr_h[33..32] 


2 


T 


DLS 


Address bus 33,32 


adr_h[31..17] 


15 


B 


DL3 


Address bus tag section 


adr_h[16..5] 


12 


T 


DLS 


Address bus index section 


dataA_h[4] 


1 


T 


DLS 


data A[4] 


dataA_h[3] 


1 


0 


DLS 


data A[3] 


data_h[127-..0] 


128 


B 


1 


Data bus, dfl for write_hit, sfl for write_block or 
STxC 


dataji[127..0] 


128 


B 


4 


Data bus, dl4 for cache_hit, sl4 for read_block or 
LDxL 


check_h[27..0] 


28 


B 


1,4 


Check bit bus, same timing as data„h 


dOEJ 


1 


I 


SF1 


Data bus output enable, 9.3/6.0 before phi_l 


dRAck_h[2..0] 


3 


I 


SF1 


read acknowledge, 9.3/6.5 before phi_l 


tagAdr_h[31..20] 


12 


I 


DLS 


Tag address [31..20], setup by drive_.last phi 4 


tagAdr_h[19] 


1 


B 




Tag address [19] inputs DL3, Parallel Port [10] if 
enabled 


tagAdr_h[18] 


1 


B 




Tag address [18] inputs DL3, Parallel Port[9] if en- 
abled 


tagAdr_h[17] 


1 


B 




Tag address [17] inputs DLS, Parallel Port[8] if en- 
abled 


tagEq_l 


1 


0 




Tag compare output, valid 17ns after tagAdr_h & 
adrjh 


tagCEOE„h 


1 


0 


2 


tagCtl and tagAdr CE/OE 


tagCtl WE_h 


1 


0 


2 


tagCtl WE 


tagCt]V_h 


1 


B 


DL3,1 


Tag valid, inputs drive_last phi_3, outputs drive_ 
first phi_l 


taffCtlS h 


1 


B 


DL3,1 


Tag shared, inputs drive last phi 3, outputs drive 
first phi_l 


tagCtlDJi 


1 


B 


DL3,1 


Tag dirty, inputs drive_last phi_3, outputs drive_ 
first phi_l 


tagCtlP_h 


1 


B 


DL3,1 


Tag V/S/D parity, inputs drive_last phi_3, outputs 
drivei.first phi_l 


tagAdrPJi 


1 


I 


DL4 


Tag address parity, inputs drive_last phi_4 


tagOk_h,_l 


2 


I 


2,4 


Tag access from CPU is ok, phi2 read tagok, phi 4 
write tagok 
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id Die io—o ^v^onu/. 


NVAX_PLUS Signals 






Signal Name 


Count 


Type 


Phase 


Function 


dataCEOE„h[3..0] 


4 


0 


2 


data CE/OE, longword 


dataWE_h[3..0] 


4 


0 


2 


data WE, longword 


holdReq_h 


1 


T 


SF1 


Hold request, 4.8 before phi_l 


xlOiQACK._Jl 


n 
1 


o 


or 1 


Hold acknowledge 


cKeq_h[2..0J 


3 


0 


SF1 


Cycle request 1.5/3.5 after sysclkoutl(phi_l) if cack 
Betup=9.3/5 


c WMaBK_nL7..0 J 


8 


0 


SF1 


Cycle write mask, 1.5 after sysclkoutl(phi_l) 


cAck_h[2..0] 


3 


I 


SF1 


Cycle acknowledge, 9.3/5 before pbi_l of sysClk- 
Outl 


iAdr_h[12..5] 


8 


1 


SF1 


Invalidate address, 4.5 before phi_l of sysClkOutl 


pInvReq_h[1..0] 


2 


1 


SF1 


Invalidate request for Pcache, 4.5 before pbi_l of 
sysClkOutl 


pMapWEJi[1..0] 


2 


0 


3 


Backmap WE, Pcache 


err_h/irq_h[5] 


1 


1 


SF1 


Externa] error interrupt, synchronized with phi_4 
and sys_first 


nait_n/irq_nl4J 






SF1 


Halt interrupt, synchronized with phi_4 and sys_ 
first 


irqja[3..0] 


4 


I 


SF1 


Interrupt requests, synchronized with phi„4 and 
sysjfirst 


tagAdr_h[33..32] 


2 


0 


4 


Parallel port [7:6] if enabled 


pp_data_h[ll] 


1 


B 


4,2 


Parallel Test Port Data, MAB clock, driver at phi_4, 
send phi_2 in MAB 


pp_data_h[5..0] 


6 


B 


4 


Dedicated Parallel Test Port Data 


oscl6in_h 


1 


I 


SF1 


Interval timer 16MHz oscillator input 


sRomOEJ 


1 


0 


SF1 


Serial ROM output enable 


sRomClk_h 


1 


0 


SF1 


Serial ROM clock/Tx data 


sRomD_h 


1 


I 


SF1 


Serial ROM data/Rx data 


icMode_h[l] 


1 


I 


SF1 


Enables pp_cmd_h<2:0> for test mode 


icMode[0}/pp_cmd[2] 


1 


I 


SF1 


Serial ROM fast fill, sRomFaBt_h/uBed a,s pp_cmd[2] 
in test mode 


pp_cmd[l:0] 


2 


I 


SF1 


EV dWSel_h[1..0] used to Belect port function in test 
mode 


dcOk.h 


1 


I 


SF1 


Power and clocks ok 


reBet_l 


1 


I 


SF1 


Reset 


tristatej 


1 


I 


SF1 


Tristate for testing 


cont_l 


1 


I 


SF1 


Continuity for testing 


test_mode_h 


1 


I 


SF1 


Enables pull-downs on check_h bits, was eel Out Jh 


vref 


1 


I 




Input reference/not used by NVAX Plus 
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CACHE READ HIT TIMING 

I DRD/IRB | PJ)C | RDK I FILL I FILL I IDLE 



341234123412341:3412341234 1 2341:3412241: 3 412341 



adr<31:5> X_ 
date A<4> X~ 



taaceoe 



tagadr XXXXXXXXXXXXXXXXX >■-- 

ctl v.d.s.p 



tagok_b X>KXXy;xXXXJO:>KXXXXX>:XXXXXXOCXXXXXXXXOCXX>; read tag ok - ok past 

dataceoe 



data h xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxjd: 



! +— 2nd octaword valid 

+— tag * 1st octaword valid 
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CACHE WRITE HIT TIMING 

! PROBE I COMPARE ! WRITE I IDLE I I 

I I I I I I 

I I I 1 I I I I I I I ' I 

34123412341234123412341234123412341234125 4 123412 

adr<31:5> X 

tagceoe \ 44 

tagadr XXXXXXXXXXXXXXXX X >~ + 4 

ct 1 v . £ . s . pxxxxxxxxxxxxxxxra > ■ 

tagWE / \ 

tagok_h "xXXXXXXXXXXXXXXXXXXXXXXXXXX write tagok - ok past COMPARE 

dataceoe \ 44 

dat£_h xxxxxxxxxxxxx> 

dataWE / \ 

! ! ! ! I I 

I ! I 

I ! 4— data hold 3 phas«s 

I ! 

I 4— WE trailing «dge 

4— tag valid 
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CACHE BYTE/WORD WRITE HIT TIMING 

I BWR_PR0BE I BWR_COMPARE I MERGE I BWR I IDLE I I 

I I I I I I I 

., I " I I I f ! I I I I 1 I 

3412341234123412341234123412341234123412 3412241234 

ad.r<31:5> X ~ 

tagceoe \ 4 + 

tagadr XXXXXXXXXXXXXXXX>; ; — + + — 

ctiv . d . s .pxxxxxxxxxxxxxxxxs ; __rz_z__r___ 

tagWE . i / \ 

tagok_h ~~ * XXXXXXXXXXXXXXXXXXXXXXXXXXXX writ* tag ok - ok past MERG 

dataceoe \ 4 +_____ 

data_h XXXXXXXXXXXXXXXX?: > ■ > ZZZZZZTZZZT' 

dataWE t /' \ 

I | o | | | | i 



I data hold 3 phases 



I +■ — WE trailing edge 
+— tag * merge data valid 



18.12 Revision History 
Table 18-7: Revision History 



Who 


When 


Description of change 


GilWolrich 


15-Apr-1991 


first edit from EV4 characteristics. 


Gil Wolrich 


Ol-Jul-1991 


update and timing diagrams. 
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Chapter 1 9 
NVAX Plus Pinout 

1 19.1 Overview 

Tins chapter contains the entire NVAX Plus pinout ordered by PGA location. In addition, it 
contains a list of differences between the NVAX Plus pinout and the EV4 pinout. 
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19.2 NVAX Plus Pinout 

PGA PAD PIN 

LOG. No. No. TYPE NAME 

date_h<33> 
_data_h-cSJ> — 
data_h<98> 
data~h<100> 
data~h<38> 
checfc_h<27> 
data_h<104> 
date~h<42> 
data~h<44> 
aata~h<109> 
data_h<47> 
data_h<49> 
daza_h<113> 
data_h<52> 
check_h<12> 
dara_h<55> 
data_h<12 0> 
date~h<122> 
ch«ck_h<7> 
data_h<60> 

data~h <61> 
data~h<62> 
data~h<127> 
ch«ck_h<9> 

check_h<15> 
VDD plane 
data_h<35> 
VS£ plane 
date_h<101> 
VDD plane 
date_h<40> 
VSS plane 
data_h<107> 
VDD plane 
data_h<110> 
VSS plane 
data_h<50> 
VDD plane 
check_b<26> 
VSS plane 
dete_h<57> 
VDD. plane 
eheck_h<21> 
VSS plane 
data_h<125> 
VDD plane 
VSS plane 
check h<8> 



Al 


009 


001 


B 


— A 


O-Of— 




_B_ 


A3 


004 


003 


B 


A4 


42 6 


004 


E 


A5 


421 


005 


B 


A6 


418 


006 


B 


A7 


412 


007 


B 


A8 


407 


008 


B 


A9 


403 


009 


B 


A10 


396 


010 


B 


All 


391 


011 


B 


A12 


387 


012 


B 


A13 


3.8 6 


013 


E 


A14 


37 g 


014 


B 


A15 


373 


015 


E 


A16 


367 


016 


B 


A17 


364 ' 


017 


B 


A18 


358 


018 


E 


A19 


355 


019 


B 


A2 0 


346 


020 


B 


A21 


347 


021 


B 


A22 


343 


022 


B 


A23 


340 


023 


E 


A24 


337 


024 


B 


El 


014 


025 


B 


B2 


04 6 


026 


P 


B3 


003 


027 


B 


B4 


039 


028 


V. 


B5 


42 4 


029 


B 


B6 


054 


030 


F 


B7 


413 


031 


B 


B8 


047 


032 


P 


B6 


404 


033 


B 


BIO 


062 


034 


P 


Ell 


394 


035 


B 


B12 


055 


036 


P 


B13 


383 


037 


B 


B14 


07 0 


038 


P 


B15 


372 


039 


B 


B16 


063 


040 


P 


B17 


363 


041 


B 


E16 


076 


042 


P 


E19 


354 


043 


B 


B2 0 


071 


044 


P 


B21 


34 6- 


045 


B 


B22 


086 


046 


F 


B23 


07 9 


047 


P 


B2 4 


335 


048 


B 
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PGA PAD PIN 

LOC. No. No. TYPE NAME 



Cl 


016 


049 


B 


check h<16> 


C2 


115 


050 


P 


VSS plane 


C3 


010 


051 


B 


date h<96> 


C4 


002 


052 


B 


dats~b<99> 


C5 


425 


053 


B 


deta_h<37> 












- C6 


419 


©54 - 




— chech h<"3."3> ■ 


C7 


414 


055 


B 


aata_n<i. 03 > 


C6 


410 


056 


B 


data_h<105> 


C9 


405 


057 


E 


data_h<43> 


Cl 0 




058 


B 


data h<45> 


Cl 1 


one 

395 


059 


B 


data h<46> 


C12 


388 


060 


B 


ciata n<i.x^> 




c 




D 


data h<114> 


Cl 4 


37 8 


0 62 


B 


data h<116> 


Cl 5 


371 


0 63 


B 


data h<54> 


Cl 6 


•ft £ c 

*3 K> V 




t 

D 


aate n%iiy> 


r*~i •? 

/ 


362 


065 


B 


data_h<121> 


Cl 6 


357 


0 66 


B 


cnecr. n<J.J.> 


Cl 9 


^ c 1 


067 




ctate_noy > 


C2 0 


34 8 


068 


B 


date h<124> 


C2 1 


342 


069 


B 


data h<12 6> 


C22 


33 6 


070 


B 


cnecr._n<«t i> 


C2 3 


■ 330 


071 


1 


QKACK n<U> 




<£C 1 


\j i & 


j. 


plnvRec h*cl> 


Dl 


022 


073 


B 


data h<94> 


D2 


017 


074 


B 


check h<2> 


D3 


015 


075 


B 


check~h<l> 


D4 


005 


076 


E 


data_h<34> 


D5 


427 


077 


B 


date_h<36> 


D6 


42 0 


078 


E 


data~h<102> 


D7 


415 


079 


B 


data~h<39> 


D8 


411 


080 


B 


date_h<41> 


DS 


406 


081 


E 


data~h<106> 


D10 


402 


082 


B 


data_h<108> 


Dll 


396 


083 


B 


check_h<24> 


D12 


389 


084 


B 


deta_h<48> 


D13 


381 


085 


B 


data_h<51> 


D14 


375 


086 


E 


data~b<53> 


D15 


370 


087 


B 


data_h<118> 


Dl€ 


365 


068 


B 


data_h<56> 


D17 


359 


089 


B 


date~h<58> 


D18 


356 


090 


B 


check_h<25> 


Die 


350 


091 


B 


data_h<123> 


D20 


341 


092 


B 


date~h<63> 


D21 


334 


093 


B 


cheek_h<22> 


D22 


328 


094 


I 


dRAck~h<2> 


D23 


152 


095 


P 


VDD plane 


D2 4 


325 


096 


I 


dOE_l 
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PGA PAD PIN 

LOG. No. No. TYPE NAME 



EI 


023 


097 


B 


data h<30> 


E2 


12 6 


098 


P 


VDD plane 


E3 


021 


099 


B 


data_h<31> 


E4 


Oil 


100 


B 


data_h<32> 


E5 


22 6 


101 


P 


VDD plane 










— £6 


255 I'OS — 




— V££ — pi-aire 


E7 


234 


103 


P 


VDD plane 


E8 


243 


104 


P 


VSS plane 


E9 


242 


105 


P 


VDD plane 


E10 


255 


106 


P 


VSS plane 


Ell 


397 


107 


B 


ch*ck_h<10> 


E12 


390 


108 


B 


data_h<lll> 


E13 


380 


109 


B 


date_h<115> 


El 4 


374 


110 


B 


data~h<117> 


El 5 


266 


111 


P 


VDD plane 


El 6 


279 


112 


P 


VSS plane 


El 7 


278 


113 


P 


VDD plane 


Eie 


291 


114 


P 


VSS plane 


El 9 


290 


115 


P 


VDD plane 


E2 0 


303 


116 


P 


VSS plane 


E21 


32 9 


117 




dRAck_h<l> 


E22 


32 4 


116 


t 


.pp_emd_h<0> 


E23 


323 


119 


T 


pp_cmd_h<l> 


E2 4 


322 


120 




cAck_h<0> 


Fl 


02 6 


121 


B 


date h<92> 


F*: 


027 


122 


B 


data h<29> 


F3 


02 6 


123 


B 


data h<93> 


F4 


020 


124 


B 


data~h<95> 


F5 


231 


125 


P 


VSS plane 


F6 


230 


126 


P 


VDD plane 


Tl 


23 9 


127 


P 


VSS plane 


F8 


238 


128 


P 


VDD plane 


F9 


24 9 


129 


P 


VSS plane 


F10 


246 


130 


P 


VDD plane 


Fll 


261 


131 


P 


VSS plane 


F12 


254 


132 


P 


VDD plane 


F13 


267 


133 


P 


VSS plane 


F14 


260 


134 


P 


VDD plane 


F15 


273 


135 


P 


VSS plane 


Fie 


272 


136 


P 


VDD plane 


F17 


285 


137 


P 


VSS plane 


F18 


284 


138 


P 


VDD plane 


F19 


297 


139 


P 


VSS plane 


F2 0 


296 


140 


P 


VDD plane 


F21 


319 


141 


I 


cAck_h<l> 


F22 


316 


142 


1 


cAck_h<2> 


F23 


155 


143 


P 


VSS plane 


F2 4 


317 


144 


1 


h©ldReq_h 
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PGA PAD PIN 

LOC. No. No. TYPE NAME 



Gl 


033 


' 145 


B 


data b<27> 


G2 


111 


146 


p 


VSS plane 


G3 


032 


147 


B 


date h<91> 


G4 


02 9 


146 


B 


data b<28> 


G5 


360 


149 


P 


VDD plane 










-G6 






— -P— 


— VS«-pi-ane 


Gl 9 


133 


151 


P 


VDD plane 


G2 0 


N/A 


152 


P 


VSS plane 


G21 


316 


153 


0 


holdAck_h 


G22 


313 


154 


0 


dataCEOE_h<0> 


G23 


312 


155 


0 


dataCEOE_h<l> 


G2 4 


311 


156 


0 


o a t acEOt__ri < z > 


HI 


037 


157 


B 


check h<4> 


H2 


036 


158 


B 


check h<18> 


H3 


035 


159 


B 


check h<0> 


H4 


034 


160 


B 


check h<14> 


E5 


361 


161 


P 


VSS plane 


H€ 


352 


162 


P 


VDD plane 


HI 9 


N/A 


163 


P 


VSS plane 


H2 0 


42 8 


164 


F 


VDD plane 


H2 1 


310 


165 


0 


dataCEOE_h<3> 


H22 


307 


166 


0 


tagCtlWE_h 


H23 


142 


167 


P 


VDD plane 


H2 4 


306 


166 


0 


cWMask h<0> 


... 


042 


169 


B 


data h<8 9> 


J2 


118 


170 


p 


VDD plane 


0"3 


041 


171 


B 


data h<2 6> 


J4 


040 


172 


B 


data h<90> 


J5 


344 


173 


P 


VDD plane 


J6 


353 


174 


P 


VSS plane 


J19 


422 


175 


P 


VDD plane 


J20 


N/A 


176 


P 


VSS plane 


J21 


305 


177 


0 


cWMask_h<l> 


J22 


304 


176 


0 


cWMask~h<2> 


J23 


301 


179 


0 


cWMask~h<3> 


J24 


300 


180 


0 


cWMask2h<4> 


Kl 


048 


181 


B 


data_b<87> 


72 


045 


182 


B 


data_b<24> 


K3 


044 


183 


B 


data~h<88> 


K4 


043 


184 


B 


data_h<2 5> 


K5 


345 


185 


P 


VSS plane 


K6 


338 


186 


P 


VDD plane 


K19 


423 


187 


P 


VSS plane 


K20 


416 


188 


P 


VDD plane 


K21 


299 


189 


0 


cWMask_h<5> 


K22 


296 


190 


0 


cWMask~h<6> 


K23 


147 


1S1 


P 


VSS plane 


K24 


295 


192 


0 


cWMask_h<7> 
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PGA PAD PIN 

LOC. No. No. TYPE NAME 



LI 


052 


1 93 




WWVWf. 


L2 


103 


1 94 






L3 


051 


1S5 


E 


data_h<22> 


L4 


050 


196 


.£ 


data_h<86> 


L5 


04 9 


1 97 


E 


dats_h<23> 












-L-f 


— 33*9~ 


—1-98 




— VSS— piane- 


LI 9 


406 


199 


P 


VDD plane 


L20 


294 


200 


0 


dataWE_h<0> 


L21 


2 S3 


2 01 


0 


dataWE_h<l> 


L22 


292 


202 


0 


dataWE_h<2> 


1.23 


289 


203 


0 


dataWE_h<3> 


L2 4 


288 


2 04 


0 


pMapWE_h<0> 


MI 


059 


205 


g 


data h<20> 


M2 


058 


206 




data b<84> 


M3 


057 


207 


E 


data_h<21> 


M4 


056 


206 


E 


data~h<85> 


MS 


053 


209 


E 


ch«ck_h<5> 


M€ 


332 


210 


P 


VDD plan* 


Ml 6 


41" 


211 


F 


VS£ plane 


M20 


287 


212 


0 


cRa^_h<0> 


M21 


286 


213 


C 


cReq_h<l> 


M22 


283 


214 


c 


cReq_h<2> ■ 


M23 


14 0 


215 


p 


VDD plane 


M24 


282 


216 


c 


pMapWE_h<l> 


HI 


060 


217 


B 


data h<83> 


K2 


11 0 


218 


p 


VDD plane 


K2 


061 


219 


E 


data_h<19> 


N< 


064 


220 


E 


data2h<82> 


H5 


065 


221 


E 


date_h<18> 


N6 


233 


222 


P 


VSE plane 


HIS 


400 


•5 1 -a 


F 


VDD plane 


N2 0 


275 


224 


I 


tagOk_l 


N21 


276 


225 


T 


tagOk_h 


K22 


277 


226 


0 


dataA~h<4.> 


N23 


280 


227 


0 


dataA_h<3> 


H24 


281 


228 


0 


tagCEOE_h 


p: 


066 


229 


E 


data_h<81> 


p: 


067 


230 


E 


data_h<17> 


P3 


066 


231 


B 


data~h<80> 


P4 


069 


232 


B 


date_h<l 6> 


P5 


072 


233 


E 


data_h<7 9> 


P6 


32 6 


234 


P 


VDD plane 


PIS 


409 


225 


P 


VSS plane 


F20 


269 


236 


B 


tagCtlS_h 


P21 


270 


237 


B 


tagCtlD_h 


F22 


271 


238 


B 


tagCtlP~h 


P23 


145 


239 


P 


VSS plane 


F2 4 


274 


240 


0 


tagEto_l 
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PGA PAD PIN 

LOC. No. No. TYPE NAME 



data_h<15> 
VSS plane 
data__h<7 8> 
data~h<14> 
VDD plane 
-VSS ' plane — 



Rl 


073 


241 


B 


R2 


095 


242 


P 


R3 


074 


243 


B 


R4 


075 


244 


B 


R5 


32 0 


245 


P 


-R6 


—32-7- 


-24<— 




Rie 


392 


247 


P 


R.2 0 


401 


246 


P 


R21 


263 


249 


B 


R22 


264 


250 


B 


R.23 


265 


251 


B 


R24 


268 


2 52 


B 




07 6 


253 


B 


T2 


077 


2 54 


B 


T3 


080 


2 55 


B 


T4 


081 


256 


B 


T5 


321 


257 




T6 


314 


258 


P 


Tl 9 


393 


256 


P 


T2 0 


384 


2 60 


P 


t::i 


258 


261 


I 


T2.2 


259 


2 62 




T23 


138 


2 63 


P 


72 4 


262 


264 


I 


Ui 


062 


2 65 


B 


U2 


102 


2 66 


P 


U3 


083 


2 67 


B 


U4 


084 


2 68 


B 


U5 


306 


269 


P 


U6 


315 


270 


P 


U1S 


376 


271 


P 


U2 0 


385 


272 


P 


U21 


252 


273 




U22 


253 


274 


1 


U23 


256 


275 


I 


U24 


257 


276 


I 


VI 


085 


277 


B 


V2 


088 


278 


B 


V3 


089 


279 


B 


V4 


090 


280 


B 


V5 


309 


281 


P 


V6 


302 


282 


P 


VI 9 


377 


283 


P 


V20 


368 


284 


P 


V21 


247 


285 


I 


V22 


250 


286 


1 


V23 


143 


287 


P 


V24 


251 


288 


T 



VDD plane 
VSS plane 

tagadx_h<19>/pp_data_h<10> 
tagadr_h<18>/pp_data_h<S> 
tag adr~b <1 7 >/ pp_dat a_h< 8 > 
tagCtlV_h 

checX_h<17> 
check~h<3> 
data_h<7 7> 

datB~h<12> 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
tagadr_h<22> 
tagadr~h<2.1> 
VDD plane 
tagadr_h<20> 

data_h<7 6> 
VDD plane 
date^_h<12> 
data2h<75> 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
tagadr_h<26> 
tagadr~h<25> 
tegadr~b<24> 
tagadr~h<23> 

data_h<ll> 
dat.a~h<74> 
data_h<10> 
data2h<73> 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
tagadr_h<29> 
tagadr~h<28> 
VSS plane 
tagadr_h<27> 
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PGA PAD PIN 

LOC. No. No. TYPE NAME 



Wl 


091 


289 


B 


data_h<9> 


W2 


067 


290 


P 


VSS plane 


W3 


092 


291 


B 


data_h<72> 


W4 


099 


292 


B 


check_h<6> 


W5 


154 


2 93 


P 


VDD plane 










-W6 


— 1-6-3 — 2-94— 




— VSS plan e 


vn 


166 


295 


P 


VDD plane 


WE 


175 


296 


P 


VSS plane 


W9 


139 


297 


I 


testClkln_h 


K10 


141 


298 


I 


t«stClkln_l 


Wll 


180 


299 


P 


VDD plane 


W12 


167 


300 


1 


clkln_h 


W13 


169 


301 


1 


clklr.~l 


W14 


199 


302 


P 


VSS plane 


W15 


196 


303 


P 


VDD plane 


W16 


211 


304 


F 


VSS plane 


wi? 


210 


305 


P 


VDD plane 


W16 


219 


306 


F 


VSS plane 


WIS 


218 


307 


P 


VDD plane 


W2 0 


227 


306 


P 


VSS plane 


W21 


240 


309 


I 


T.agadrF_fc 


W22 


244 


310 




pp>_dat.e_h<6> 


W23 


245 


311 


I 


tagadr_h<31> 


W2 4 


246 


312 


I 


tagadr_h<30> 


Yl 


093 


313 


E 


data_h<8> 


Y2 


096 


314 


E 


data_h<7i> 


Y3 


097 


315 


E 


date~h <-7> 


Y4 


106 


316 


E 


data_h<68> 


Y5 


161 


317 


P 


VSS plane 


Y6 


166 


318 


P 


VDD plane 


Y7 


165 


319 


P 


VSS plane 


Y8 


170 


320 


P 


VDD plane 


yo 


181 


321 


P 


VSS plane 


Y10 


174 


322 


P 


VDD plane 


Yll 


167 


323 


F 


VSS plane 


Y12 


186 


324 


P 


VDD plane 


Y13 


193 


325 


P 


VSS plane 


Y14 


192 


326 


F 


VDD plane 


Y15 


205 


327 


P 


VSS plane 


Y16 


204 


328 


P 


VDD plane 


Y17 


215 


329 


P 


VSS plane 


Y18 


214 


330 


P 


VDD plane 


Y19 


223 


331 


P 


VSS plane 


Y20 


222 


332 


P 


VDD plane 


Y21 


232 


333 


O 


adr_h<8> 


Y22 


237 


334 


0 


adx2h<5> 


Y23 


132 


335 


p 


VDD plane 


Y24 


241 


336 


T 


pp_data_h<7> 
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PGA PAD PIN 

LOC. No. No. TYPE NAME 



AAl 


096 


337 


B 


AA2 


094 


338 


P 


AA3 


105 


339 


B 


AA4 


112 


340 


B 


AA5 


117 


341 


B 


-fcfrfi — ■ — 




k ~*342 — 


— j_ 


A£7 


12 


343 




AA8 


13 6 


344 




AA° 


14 4 


345 




AA10 


146 


346 




AA11 


157 


347 




AAl 2 


1 62 


348 




AAl 3 


164 


349 


o 


AAl 4 


171 


350 




AAl 5 


1 82 


351 


2 


.AAl 6 


188 


352 


T 


.AAl 7 


191 


353 




AAl 6 


1 97 


354 


g 


AAl 9 


202 


355 


g 


AA20 


213 


356 


g 


AA21 


217 


35" 


o 


AA22 


225 


358 


o 


AA23 


233 


359 




AA24 


236 


360 




AB1 


100 


361 


B 


AB2 


104 


362 


B 


AB3 


108 


363 


B 


AB4 


113 


364 


B 


AE5 


116 


365 


B 


AB6 


122 


366 


T 


AB7 


12 9 


367 


1 


A68 


137 


368 


1 


AB9 


148 


369 


I 


AB10 


149 


370 


0 


AB11 


153 


371 


0 


AE12 


159 


372 


I 


AB13 


160 


373 


I 


A514 


172 


374 


I 


AB15 


179 


375 


I 


AB16 


185 


376 


T 


AB17 


190 


377 


E 


AS16 


196 


378 


B 


AS19 


201 


379 


B 


AE20 


207 


380 


B 


AB21 


212 


381 


B 


AB22 


220 


382 


0 


AB23 


127 


3e3 


P 


AB24 


229 


384 


0 



check_h<20> 
VDD plane 
data_h<5> 
data~b<6 6> 
data_h<0> 
-±Adr h< 6^> 



iAdr_h<10> 
vKef 

sysClk0u1;2_h 
sysClkOut:2~l 
pp_data_h<l> 
sysClkOutl_b 
sysClkOut.l_l 
conr_l 

«rr_h/ (irq_h<5>) 

pp._data_h<ll> 

adr_h<31> 

adr~h<27> 

adr~h<24> 

adr_h<17> 

adr2h<15> 

adr~h<ll> 

adr_h<7> 

adr~h<6> 

data_h<70> 

data_h<69:> 

data_h<67> 

data~h<2> 

data~h<64> 

iAdr~h<7> 

iAdr~h<12> 

r«s et_l 

sRomD_b 

sRomOE_l 

cpuClkOut__h 

dcOk_h 

triState_l 

icMode_h<0> 

halt_h7 (ii'q_h<4» 

pp_data_h<3> 

adr_b<32> 

adr~h<28> 

adr~h<2 5> 

adr~h<21> 

adr3h<18> 

adr~h<14> 

VSS~plane 

adr h<S> 
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PGA PAD PIN 

LOC. No. No. TYPE NAME 



data_h<6> 
VSE plane 
VDD plane 
data_h<65> 
VSS plane 



AC1 


101 


385 


E 


AC2 


001 


386 


P 


AC 3 


006 


387 


P 


AC 4 


114 


388 


E 


AC 5 


00"? 


389 


p 


-AC* 


— 1-2-3- 






AC - ? 


012 


391 


p 


AC 8 


128 


392 




ACS 


013 


3S3 


p 


AC10 


150 


394 


o 


AC11 


016 


395 


p 


AC12 


158 


396 


I 


AC 13 


019 


397 


p 


AC14 


177 


398 


I 


AC15 


024 


399 


p 


AC16 


184 


400 


T 


AC17 


025 


401 


p 


AC18 


195 


402 


E 


AC19 


030 


4 03 


p 


AC20 


206 


4 04 


B 


AC21 


031 


405 


p 


AC22 


216 


406 


0 


AC23 


038 


407 


p 


AC24 


226 


406 


0 


AD2 


107 


409 


E 


AD 3 


109 


410 


E 


AD 4 


115 


411 


E 


AD 5 


12 0 


412 




AD 6 


12 4 


413 


1 


AD7 


131 


414 


I 


ADS 


135 


415 


I 


AD9 


130 


416 


I 


AD10 


134 


417 




AD11 


151 


416 


T 


AD 12 


156 


419 


I 


AD12 


173 


420 


.1 


AD14 


176 


421 


1 


AD15 


178 


422 


T 


AD16 


182 


423 




AD17 


189 


424 


E 


AD18 


194 


425 


B 


AD19 


2 00 


426 


E 


AD20 


2 03 


427 


E 


AD21 


208 


426 


B 


AD22 


209 


429 


E 


AD23 


221 


430 


O 


AD24 


224 


431 


0 



VDD. plane 
iAdr_h<ll> 
VSS plane 
sRomClk_h 
VDD plane 
oscl6M_H 
VSS piane 
irc_h<2> 
VDD plane 
pp_data_h<4> 
VSS plane 
adr_h<2 9> 
VDD plane 
adr_h<22> 
vss~plane 
adr_h<l6> 
VDD plane 
adr_h<10> 

date_h<4> 

data~h<3> 

date~h<l> 

iAdr~h<5> 

iAdr~h<9> 

clk_rst_h . 

t«st_mode_h 

plnvR«q_h<0> 

pp_data_h<0> 

pp__data_h<2> 

icMode_h<l> 

irq_h<0> 

irc_h<l> 

irq^h<3> 

pp_data_h<5> 

adr_h<33> 

adx2h<30> 

adr~h<26> 

adr~h<23> 

Bdr_h<20> 

adr~h<19> 

adr~h<13> 

adr~h<12> 
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19.3 NVAX Plus/EV4 Pinout Differences 



The following table shows the differences between the EV4 chip pinout and the NVAX Plus 
pinout. 



- PGA PAD SIG EV4 NVAX -Plus 
LOG. No. No. TYPE NAME TYPE NAME 



E22 


32 4 


118 


I 


dWSel_h<0> 


I 


pp_cmd_h<0> 


E2 3 


323 


119 


I 


dWSel~h<l> 


I 


pp_cmd~h<l> 


E21 


32 9 


117 


I 


dRAck_h<l> 


3 


dRack_h<l> 


L2 4 


*!> ft 0 
£ O O 


2 04 


0 


dMapWE_h 


O 


pMapWE_h<0> 


AD St 




416 


I 


dlnvReq_h 


: 


plnvReq_h<0> 


M24 


282 


216 


N 


spare^0> 


Q 




AD7 


131 


414 


K 


spare<!l> 


j 


Clh 3TST- h 


AD10 


134 


417 


N 


s j^are*<2 > 




pp< dans h^O^- 


C24 


231 


072 


K 


s pare<3 >■ 


1 


plnvRec[_h<l > 


AD11 


151 


416 


N 


spare<4> 


O 


pp data h<2> 


AC 12 


158 


396 


K 


spare<5> 




oscl6N_K 


AA11 


157 


347 


H 


spare<6> 


0 


PF_data_h<l> 


AD16 


183 


423 


N 


spare<7> 


0 


pp_date_h<5> 


AM. 6 


186 


352 


N 


spare<8> 


0 


PF_data_h<ll> 


ABU 


185 


376 


T 


perf_cnt_h<0> 


0 


pp_data > _h<3> 


AC16 


184 


400 


T 


perf_cnt_h<l> 


0 


pp_dats_h<4> 


AD 8 


135 


415 


I 


eclOut_h 


1 


test_mode__h 


R23 


265 


251 


I 


tagadr_h<17> 


B 


tagadr_h<17> 


R22 


264 


250 


I 


tagadr_h<18> 


B 


tagadr_h<18> 


R21 


263 


246 


I 


tagadr3h<ie> 


B 


tagadr~h<19> 


X22 


244 


310 


I 


tagadr2h<32> 


0 


pp_data__h<6> 


Y2 4 


241 


336 


I 


tagadr_h<33> 


O 


pp_data_h<7> 


Y22 


237 


334 


E 


adx_h<5> 


0 


adr_h<5> 


AA24 


236 


360 


B 


adr^h<6> 


0 


adr~h<6> 


AA23 


233 


359 


B 


adr~h<7> 


O 


adx~h<7> 


¥21 


232 


333 


B 


edr~h<8> 


0 


adr~h<8> 


AB24 


22fr 


384 


B 


adr~h <s> 


O 


adr~h<9> 


AC24 


228 


408 


B 


adr~h<10> 


O 


adr_h<10> 


AA22 


225 


358 


B 


adr_h<ll> 


O 


adr~h<ll> 


AD24 


224 


431 


B 


adr2h<12> 


0 


adr~h<12> 


AD23 


221 


430 


B 


adr_h<13> 


0 


adr~h<13> 


AB22 


220 


382 


B 


adr~b<14> 


0 


adr~h <l4> 


AA21 


217 


357 


B 


adr~h<15> 


0 


adr~h<15> 


AC22 


216 


406 


B 


, adr~h<16> 


0 


adr~h<l 6> 



NOTE (1) : PGA LOC- E21, is specified in version 2.0 of the EV specification as 

dRack_h<l> for EV4 and pp_cmd_b<2> for NVAX Plus. This has been changed 
version 2.0 of the EV specification was published. PGA LOC. E21 is now 
dRack_b<l> for both the EV4 and NVAX Plus chips'. The NVAX Plus chip 
now uses PGA LOC. AB14, icMode_h<0> as both sROMfast and pp_cmd_h<2>. 
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PGA PAD SIG EV4 NVAX Plus 

LOC. No. No. TYPE NAME TYPE NAME 



AD13 


173 


420 


1 


irq_h<0> 


I 


irq_h<0> 


; interrupt at IPL20 


only NVAX Plus 


AD14 


176 


421 




irq_h<l> 


I 


irq_h<l> 


; interrupt at IPL21 


only NVAX Plus 


AC14 


177 


396 




i.iq\<2> 


3 


irq_h<2> 


; interrupt at IPL22 


only NVAX Plus 


AD15 


176 


422 


7 


irc~h<3> 


1 


irq_h<3> 


interrupt at IPL23 


only NVAX Plus 


"AB'lS 


~irr 


375 


_. 


irq_h<4> 




halt_b 


;iialt interrupt for 


NVAX Plus 


AA15 


182 


351 




±rq_h<5> 


I 


err_h 


;harc error interrupt for NVAX Plus 



In addition to the signals listed in the £V4 specification, the EV 
irq_h<5:0> interrupt pins are noted because of the difference in 
functionality between EV4 and NVAX Plus for these pins. 



19.4 Revision History 



Table 19-1: 


Revision History 






Who 


When 


Description of change 




Gil Wolrich 


21-OCT-1991 


Add pinouts ot NVAX Plus spec. 
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